Pages

Showing posts with label TeraGrid2011. Show all posts
Showing posts with label TeraGrid2011. Show all posts

Thursday, July 21, 2011

Bird-brained computing at TeraGrid

And a final note from TeraGrid about birds before I too fly the coop tomorrow, to Chicago. At the very end of the science session today, Daniel Fink of the Cornell Lab of Ornithology talked about modelling avian distributional dynamics on TeraGrid. Basically, bird watching with big computers.

So why birds? First of all, they are very good bio indicators, the so-called canaries in the coal mine, that can tell us about the health of the environment. Second, there is lots and lots of data - people love birds and there is a vast archive of amateur bird-watching data dating back over decades. One example of this is the citizen science website, eBirds (ebirds.org).

When you’re gathering observational data of this type it needs to be comprehensive – species, place, date, time, and how long did it take how many people to gather the data. This last one is particularly important as it gives you information on how much bias might be involved in the information.

Data sets can contain up to a million hours of volunteer data BUT it can still be sparse in terms of geographical coverage. You have holes in it where there aren’t many observers. Fink and his team are working to help fill the gaps and predict occurrence by associating observations with local environmental data. They’ve tried this with some very different bird species such as the indigo bunting and the solitary sandpiper and it seems to work. For example, their models of indigo bunting distribution show holes in the distribution patterns corresponding to cities. Which is what you would expect from a species that prefers hedgerows and rural environments.

Fink showed off his BirdVis tool that has recently been accepted for publication, where you can scroll through the influence of habitats during breeding periods and during migration, over dynamic time frames – all very impressive.

And with that, I must fly!

The sunset of Teragrid and the dawn of XSEDE

As the sun sets on the TeraGrid era, the horizon opens up for the future of XSEDE, described John Town picturesquely, leader of the new project XSEDE: Extreme Science and Engineering Discovery Environment, part of the National Science Foundation’s eXtreme Digital program. Towns, from NCSA, outlined the road from Teragrid to XSEDE at the TeraGrid11 plenary this morning. Over its ten years, TeraGrid provided a lot of resources but was particularly characterised by the high level of help and support available to users. It mainly supported the NSF, but also other agencies such as DOE, NIH and NASA, in a wide diversity of fields including physics, molecular biosciences, astronomy and many more.

With TeraGrid closing, a new program was needed – XD or eXtreme Digital. There were two proposals submitted and XSEDE is the successful combination that has been awarded funding. The start has been delayed for around a year beyond the expected date, but is now up and running. The vision is to enhance the productivity of scientists and engineers but in a shift in emphasis from TeraGrid, this vision doesn’t specifically mention HPC - although it is vital part. To give you an idea of the scale of funding, the Training, Outreach and Education activities will be allocated around $3 million dollars a year for 5 years.

XSEDE has been out to speak to communities to gather their needs up to 2015. Earth science, for example, are looking for support for their cybershake work in earthquake modelling involving a few big parallel jobs, and many thousand loosely coupled jobs. Others such as iPlant, that is solving grand challenges in plant science, need high speed access to data in databases scattered in different places, plus an HPC component to do the analysis. Brain science, including the Human Connectome Project aims to understand the wiring of the human brain, a hugely complex problem. They will have Petabytes of data to archive and stream in near real time at 1 GB/s.

As for TeraGrid, XSEDE’s focus is on user support services. They want to be able to respond to requirements quickly, so have money set aside in the budget to hire in experts in the short term from external sources. They will also be relying on their network of dozens of Campus Champions who provide onsite expertise. The TEOS team will be providing particularly intensive support to 5 to 10 campuses a year to help this along.

As far as architecture is concerned, they will be moving forward carefully, based on standards. They are currently documenting the architecture – ‘describing the elephant’ as Towns put it, but from the perspectives of different stakeholders eg service provider, sys admin, power and occasional user, gateway developer, security officer, NSF program manager, campus champion, trainer etc etc.

Connecting with users is also of key importance. They will use the tried and tested methods of trouble ticket tracking, focus groups, user interviews and ‘shoulder surfing’, watching how users interact with the services. However, they are also setting up a User Requirements Evaluation and Prioritisation Working Group that will help them to prioritise requirements through the direct participation of stakeholders. XSEDE is planning user focused workshops and users will be represented in the governance structures of the program, through the XSEDE Advisory Board, User Advisory Committee and Service Providers Forum.

Interesting times ahead!

Wednesday, July 20, 2011

How to build a better portal at TeraGrid11


It's the first official day of Teragrid and we’re all pleased to be here - it's hard to fault a conference that welcomes you with a generous cooked breakfast and free Starbucks coffee!

This morning we heard from Nora Sabelli of SRI International, with a thought-provoking talk on how we prepare for the future of data and compute-driven modern science.

After the break, I joined the Training, Education and Outreach track in the 'Solitude' room. Fortunately, despite the name, I wasn't alone and the session had drawn together an impressively wide range of people - scientists, developers, outreach specialists and teachers, among others.

The session covered using portable apps to help lower the entry barrier to grids. Richard Knepper, of Indiana University, presented his imaginative investigation into the affiliation networks of Teragrid users, based on their social network use. Training projects have the widest network, and project leaders tend to be grouped together in disciplines, although a few seem to be involved in a huge number of projects (I think we all know a few people like that!)

Jeff Sale of the University of California San Diego, also gave us the benefit of his experience in building cyberlearning communities. In his time, he has created portals for teachers looking to base lessons on real data, for grid campus champions and for students.

He has used a range of commercial and open source solutions to construct the portals, including a mix of content management systems, learning management systems, such as Moodle and full blown grid portal toolkits - basically, they're all mash ups.

A veteran of developing community focused portals, Jeff let us into a few of the secrets that he's learnt over the years. First off, define your goals and your measures of success clearly from the outset - don't just stick a portal in at the end of a proposal because it sounds like a good idea.

Secondly, you need to have adequate funding - for both start up AND sustainability. Also, don't forget that if you build it they will not necessarily come! Setting up a portal as a way to nurture a community sometimes works, but sometimes it doesn't. You need to plan a strategy for outreach and bring the right people on board to do this. Developers are not always the best communicators of their work....

You need to know your community – their technical ability, their online social skills. Identify community leaders and the ‘supergeeks’ (or perhaps more kindly, beta testers) who will help you with the development. Don't experiment with the whole community, you’ll probably scare them away.

Get the technology right and use it appropriately. Younger communities, such as students, are ‘digital natives’, and many social networking plug ins are available for the already initiated. There are also several ‘out of the box’ solutions around such as Joomla, Mambo and Drupal, but you might need to hack around in the coding to get exactly what you want. Be creative, and learn some PHP and Flash. And last but not least, you should practice what you preach and use the technology yourself!

Tuesday, July 19, 2011

Under the Salt Lake City clouds for TeraGrid and OGF32

This week I'm at Teragrid and OGF32 in Salt Lake City. Living in the Netherlands, I thought I was used to flat, open landscapes, but the vista here is on a completely different scale.

Flying in on Sunday after a dash up from the South of France on Saturday, I unfortunately missed a lot of OGF32, but was able to join the workshop on Science Agency uses of Clouds and Grids on Monday. It's been a pretty intensive day, with 21 different snapshot presentations, so it's a difficult workshop to summarise (particularly when my body clock now thinks its roughly Tuesday). So here are a few snapshots that I've picked up on in between caffeine hits during the breaks...

David Wallom of Oxford University updated us on the SIENA roadmap effort, and pointed out a key quote for the standards community from Neelie Kroes, VP of the European Commission:

"International standardisation efforts will also have a huge impact on cloud computing. Open specifications are key in creating competitive and flourishing markets that deliver what customers need. Europe can play a big role here – building on, for example, the SIENA initiative and its development of a 'standardisation roadmap for clouds and grids for e-Science and beyond'."

SIENA has surveyed the standards work being done by the various DCI projects, and is now working on a gap analysis. You can download their roadmap to interoperable infrastuctures at http://www.sienainitiative.eu.

Ruth Pordes of Open Science Grid introduced OSG and outlined their Virtual Organisation structure - I was intrigued to hear about some of their multi-disciplinary VOs, which seems to be a growing trend. OSG is also considering how the cyberinfrastructure landscape will change now that XD XSEDE is replacing TeraGrid. Could they have a role as cloud brokers?

EGI.eu's Steven Newhouse talked about the federation of virtualised resources from an EGI context. Discussions are now focussing in on key usage scenarios such as running a predefined VM image, running "my" VM image (with the user's own data), how to decide which virtualised resource to use, how to manage accounting across resource providers, including monitoring reliability/availability of these resources and notification of VM state changes.

Daniel Katz of the University of Chicago gave us a run down on the open challenges for production DCIs. The goal is to deliver maximum science but the discussion is always around sustainability. We need to achieve useful work but ideally with someone else paying for it! Another issue is how can we measure delivered science? We can track papers and citations but these are blunt instruments for measuring impact. Another challenge is to develop tools that allow the infrastructure to deliver maximum science. Currently we do this well on a case by case basis, but offering scientists an off-the-shelf set of interoperable tools is still a bit of a dream.

Kate Keahey, Argonne National Laboratory, showed us the Nimbus cloud project, which is working with hybrid clouds ie combinations of private, community and public clouds. Nimbus allows users to build turnkey dynamic virtual clusters based on these resources, and to try out applications that don't work on the grid, such as very complex non-portable software. According to Kate, cloud outsourcing is now no longer a choice. Benefits of clouds are their economy of scale, flexible access to different resources and lack of operational overheads but before picking a cloud, you have to consider a host of factors - is it scaleable, easy to use, cost effective?

That said, clouds are definitely changing the patterns of how people work.

The Teragrid11 event proper starts tomorrow. The hot topic in the US at the moment is the transformation of NSF's TeraGrid program, which has provided cyberinfrastructure resources to the research community for more than 10 years, into XSEDE - Extreme Science and Engineering Digital Environment.

More tomorrow (or whatever day it is where you are!)