Wednesday, September 23, 2009
Data access issues hammer back
I attended this afternoon to the WLCG Operations session. One of the cool moments has been that one in which the chairman was offering a prize for those listening to the talk and able to tell the two main issues in the last presentation. Silence was the answer... people were busy reading their mails? Well, this could be a topic for another post: "benefits and drawbacks of wireless connectivity in workshops". May be another day.
There was a very interesting presentation from the GridPP Tier-2s reporting a detailed analysis of ATLAS Hammercloud tests results at their sites. Site-dependent effects were clearly seen: some sites performed better copying input files to WNs local disk while others did better when remote opening files. They were using the Hammercloud as a tool to probe their facilities with a quite impressive resolution, so they were able to produce plots of cpu efficiency versus batch system occupancy that were very enlightening. Effects such as disk contention could be clearly seen and, most important, quantified. As Jeremy Coles noted: "It was always known that data analysis was hard. But now 'hard' can be quantified".
The discussion after the GridPP talks was also quite lively. I recall here two comments that were made and that I think are examples of concrete issues that we can start to work on: one is that we need some mechanism to allow sites to identify analysis jobs on the batch schedulers, so that the proper priority, resource allocation, etc can be given to them. VOMS groups or roles could be used to do this, though the pilot job frameworks could be just going in the opposite direction. The second comment was that we need also a clear mechanism for sites to advertise to users which is the best protocol to be used locally, both for reading input and writing output data.
Looks like we should all now go back to our host sites, start hammering them... and see what happens.