Wednesday, July 18, 2012

Field of Dreams: How are the XSEDE users using the infrastructure?

“If you build it, they will come.”  That’s a quote (or slight misquote actually) from the film Field of Dreams. The premise being that if you build something people really want (baseball field, website, supercomputer, cyberinfrastructure) they will find their way to it, and all will live happily ever after.

In the research and e-infrastructures field this is perhaps not the full story. It's true that potential research users are out there in large numbers, but matching them to the resources of their dreams often takes a lot of work, and not a little persuasion, outreach and training.

Yesterday’s talk by XSEDE’s David Hart of the National Center for Atmospheric Research on usage patterns for XSEDE resources was particularly interesting for those reasons – after a year’s operation and the transition from Teragrid, who are XSEDE’s users and how do they use the infrastructure?

From the official project rankings, it’s difficult to get a clear picture. Projects move up and down in the rankings over time, and the smeared out data doesn’t give much indication of usage. Hart decided to derive his own picture of the data, and looked at three factors: injection, continuation and completion ie how many projects were new in any one year and how long did they last?

In a given year, Hart discovered that around 40% were likely to be new, 40% continued beyond that year and 30% completed during the year. Larger projects were more likely to continue beyond the year, smaller projects often started and ended in the same year. The most likely duration is less than 6 months, corresponding to about 1000 projects. Around 63% lasted more than18 months.

When Hart turned his attention to the users of XSEDE, it turns out that projects are more stable than individual users – 48% of users in any given year were new, (meaning they had never submitted a job before) and about 30% carried on using the infrastructure into the next year. About 40% of users are graduate students however, so this perhaps correlates with the high turnover.

Out of the nearly 8400 users, 4000 users lasted less than 6 months. For about 2000 users, their first and last jobs were less than 15 days apart. This figure does not include the number of users who went to the trouble of logging in but never submitted a job at all. 

Hart was keen to point out that short lived usage of the infrastructure is not in itself something that you would want to avoid – this level of usage could still be exactly what the researchers and grad students need for their work, or to complete assignments and training. Comparing these patterns with the closest European equivalents to XSEDE would be interesting – PRACE, the supercomputing network has supported 103 projects, over 5 calls, representing more than 2.7 billion core hours. The European Grid Infrastructure, the pan-European grid computing infrastructure, has over 20,000 users across Europe and beyond and runs 1.2 million jobs a day. If you build it, and want 'them' to come, I guess you need to know your user.

