Pages

Thursday, July 8, 2010

Storage, storage, storage

The Grid does three things: Moves data, stores data and processes data. Arguably, the middle point, storage, is the most important*, as the other tasks would be pointless without it. Yesterday afternoon saw a session on storage issues, and as I had hoped for in my opening blog post, the session was indeed very forward facing.

A huge problem we have is data locality. Traditionally, we move data to sites, then run jobs at those sites. This can fail for a variety of reasons, such as

  • Some files (out of many) are missing - jobs fail
  • Site hosting data goes down - jobs disappear
  • Some datasets are more popular than others - how to judge interest when subscribing data?
  • Tape systems are slow - makes CPU efficiencies poor when waiting for recalls
In the summary of the recent Amsterdam storage workshop, the theme seems to be moving towards responding more dynamically to these issues. If a site as 95% of the data you want, it is probably not too inefficient to run the jobs and pull the remaining 5% over the WAN from another site. This could well be faster than waiting for tails in datasets transfers to finish.

There was also talk of separating the tape system from the caching layer. I remain to be convinced how this can work in practice, but it is interesting to think that all tape operations may have to become operator-driven, rather than the system responding to files which need recalling automatically. That will require good tools to not make disk full and operators quivering wrecks.

A panel discussion on the future of Tier-1 storage followed. As far as I could see, everybody is basically happy with what they have now, and are experienced in operating their respective storage systems, but eyes must be kept open for the next technology to move to in time.

Finally, some presentations and demonstrations of prototypes to show new data management and access technologies were given. Two stood out to me. The first was given by Graeme Stewart of ATLAS fame, on their dynamic data placement work [1]. Essentially, their workflow management system can automatically notice which datasets are very popular, and replicate the data to Tier-2s which are not very busy. To be done next is to automatically re-schedule queued jobs, such that existing queued jobs can take advantage of the new data, rather than sit in a queue at a busy site.

Secondly, the ARC Caching model used by NorduGrid caught my eye [2]. This is a significantly different data management model to those used by gLite or OSG sites; the CEs pull in the data a job requires remotely, before running the job. What their caching systems does is cache this data locally on the front ends to save WAN transfers. Their systems has scaled to Petabyte-sized storage happily.

One can imagine a combination of these two strategies; downloading missing data at a site into a local cache, which is flushed with some Least Recently Used or Adaptive Replacement Caching strategy... Interesting stuff!

[1] http://indico.cern.ch/getFile.py/access?resId=0&materialId=slides&contribId=16&sessionId=2&subContId=2&confId=82919
[2] http://indico.cern.ch/getFile.py/access?resId=0&materialId=slides&contribId=16&sessionId=2&subContId=5&confId=82919

* Yes, I know this is a ridiculous statement, but I like making them.

No comments: