Thursday, July 8, 2010

Second day in London: experiment's day

After a quite intense first workshop day, ending up with a bunch of demonstrators (it looks as many people will be busy this summer) and a victory of Spain versus Germany on the world cup semifinals (Holland: be prepared), we got to the experiments jamborees today. I will try to highlight in this post some of the issues that were reported or discussed and draw my attention.

The first turn was for LHCb this time. We learned about how the current LHC running conditions are bad for this experiment. With that many protons per bunch and so few bunches, LHCb gets more pile-up than expected and less luminosity. They realised how bad this was when they tried to reprocess those busy events and found that the average reconstruction time was about twice the nominal and the stripping time... 80 times more!! They have adapted some parts of the software to throttle this and yesterday a new bunch of reconstruction jobs were sent. Fingers are crossed now to see if this time the events can be processed.

During the CMS slot, the guys of the red detector presented some quite impressive numbers of the rates at which their MonteCarlo re-digitisation and re-reconstruction workflows are capable of writing output data. In a big site like FNAL (6000 slots), these jobs can write up to 1PB of data every day! Not surprisingly, this has already generated some issues with the tape systems when trying to push this data stream into the archive.
In another presentation, an update of the model for planning the size of the different data flows at sites was presented. The numbers shown there for the bandwidth to the MSS surprised to some of the Tier1 sites in the audience since they looked pretty small (few tens of MB/s), specially if compared with the just shown avalanche of output data of the MC re-reconstruction. Modeling and planning is always tough in a fast evolving environment like the LHC. We will need to stay tuned to get the new calculations and plug them in the MSS purchase plans.

The ATLAS turn started with a really interesting presentation from Fernando. He showed the last "bells and whistles" of the ATLAS data management system. Plenty of monitoring stuff (yes! we love monitoring... you know) that shows which fraction of the used disk space are primary replicas, secondary replicas, etc. With all this information at hand they can do great things like a system that watches the disk occupancy and triggers an automatic deletion of "less important" data when 90% occupancy is reached.
A plot in this presentation showing the evolution of used disk at the different Tier1s versus time compared with the installed and pledged capacities generated some interesting remarks. Someone in the audience was worried because it was showing that ATLAS is only using about half of the pledged disk resources. Kors replied that actually the plot was showing that, when the LHC is taking data, ATLAS can fill up the disk very fast. This can be more worrying, indeed... if one thinks that we should survive with the current deployed resources until April next year. Luckily, Fernando's tools will make room for the new data to come... even if some secondary or tertiary replica needs to be trashed.

No comments: