Monday, March 8, 2010

Day of the iRODS Workshop

This is Andy Turner's forth in a series of GridCast posts about ISGC 2010. His previous post is here.

Know what iRODS is? If not, then click here and hopefully lmgtfy will give you some pointers, or you could use your favourite search engine. You could also read on, and although I am by no means an expert and might not describe things well, I met some of the experts today and you can find out who they are if you do (read on).

iRODS is a useful Grid middleware that is configured with policies to help move data around on Grid computers so that distributed computing can be done relatively fast and conveniently. I am interested in using iRODS and a general interface to it and other such data Grid middleware to run simulation models on distributed resources. These data Grid middleware provide a convenient and scalable way to store and retrieve data. The access to the data looks the same from all parts of the distributed compute resource, but in fact some of the data might be on the local machine or anywhere else in the data storage elements that have been federated.

If you've not been put off reading further already then great! I'll try explaining that again in a little bit more detail with respect to an agent based simulation model running on computers that are networked but might be spread out all over the world.

Each computer running part of the model will have data in memory, some of it representing things we will call agents. Each of these Agents has a place for its data (on the federated data storage system) and this place has an address that can essentially be made to look the same from each computer running the model. This is not necessarily done by replicating a file system with all the data on each computer and keeping all these synchronized. There might be too much data for some computers to store or data may be changing or be added so often so that the replication is too expensive. Indeed each computer might use another special data serving computer to provide it with all its data.

Anyway, consider the agent "Agent_A" distributed on computer "C_0". Consider the case in the model when an agent "Agent_B" distributed on computer "C_1" wants data about Agent_A. C_1 can get data about Agent_A from the persistent storage including which computer it is on. It can then use the data about what computer the agent is on to ask directly for data about Agent_A from C_0. Hopefully you somehow know this would be useful!

So, how do I set the right policies for iRODS so that my model runs in a fast and efficient way? Well, I'd like iRODS to learn what is a good policy for my models, so I don't have to do any optimisations. Maybe that is what it does?

That's all from me for today. More details on what I got upto on the day of the iRODS workshop can be found here.

No comments: