Thursday, May 31, 2012

Nancy Wilkins-Diehr's reflections on IWSG-Life

After nearly 20 hours in travel time, I arrived in beautiful Amsterdam at the Academic Medical Center (AMC or Academisch Medisch Centrum in Dutch) for the International Workshop in Science Gateways in the Life Sciences. This event was being held in conjunction with the HealthGrid 2012 conference. The overlapping events and location at the AMC meant there was a real draw from the medical community. Thank you to the organizer Sandra Gessig for inviting me to give a talk at this event.

The AMC is one of the largest hospitals in the Netherlands and the conference was held in a conference room right off the main atrium inside the hospital.

Presentations featured speakers from many countries – the Netherlands, Switzerland, the Ukraine, Italy, Hungary, Germany, the UK, the US, Poland and more. Gateways were presented that handled medical imaging, analysis of millions of samples, genomic analysis. Speakers described using gateways to solve problems involving disparate workflows, interoperability, huge varieties of back end codes, sharing data (and protected data, too) across institutions, data curation and community building. As the session progressed, presenters observed parallels in other projects relevant to their own work and I noticed many references to the work of previous presenters. To me that is one of the primary benefits of such a workshop – to learn about the related work of others and to be able to immediately speak with an author after a presentation. I plan to investigate the use of HDF5 containers for many small files on a parallel filesystem as presented by Vincent Rouilly (ETH Zurich) in his iBRAIN2 work.

Over and over speakers mentioned the very rapid pace of development in the life sciences and any gateway building effort that had a significant development cycle was doomed.

Roberto Barbera (University of Catania and INFN, Italy) made many wonderful points in his keynote talk. He described the rapid advances in Web technologies with vintage visual screen shots of browsers some of us remember like Lynx, Mozilla and Netscape. He mentioned that gateways feature prominently in the eResearch 2020 final report.   In fact gateways are a key centerpiece for Europe’s GRDI2020 vision (, which puts forth a vision for global research data infrastructure for 2020.

Roberto highlights the difference in size between the number of users of social networks, the number of EU users and the number of EGI users to make the point that we can increase the scientific user base through the use of Web technologies. Through Lego imagery, Roberto highlighted the importance of standards and building blocks to build both simple and complex structures. Catania gateways are used by 114 organizations in 41 countries with an increased focus on Latin America.

Personally, I was impressed at job launching modules presented by both Roberto and Peter Kacsuk (see next paragraph) that interface to many different grid infrastructures (Globus, Unicore, gLite, ourgrid, garuda, GOS). For gateways that use modules such as these, middleware upgrades and infrastructure changes would be relatively straightforward for gateway developers.

Peter Kacsuk (MTA SZTAKI, Hungary) spoke about his SCI-BUS (SCIentific gateway Based User Support) program, a 3-year EU-funded project that creates a WS-PGRADE-based framework for gateway building, but also provides support for those developing gateways. 11 gateways have been created in the first project year, several of which presented at the IWSG workshop and there is an exciting summer school being held in Budapest this summer. My initial reaction was to look into joining SCI-BUS as an associated member providing my organization will agree to the terms of the MOU.

It was tremendously exciting to me to observe so many overlapping areas of interest between presentations at IWSG and my own work in the NSF XSEDE program. The BioAssist project from the Netherlands Bioinformatics Centre mirrors work we in XSEDE are trying to do to extend the capabilities of Galaxy. Many of the codes Bhanu Rekepalli is addressing in his super-scaling work in the Systems Biology Workbench (BLAST, HMMR, ClustalW, MUSCLE, PhyML, GARLI, RAxML, MrBayes, DOCK6, AutoDock, AMBER, NAMD) were also mentioned by presenters at IWSG.

The MoSGrid presentation was the rare presentation from a user of gateways – a real scientist (Ines dos Santos Vieira) rather than a gateway developer. It was interesting to hear her perspective on what aspects of MoSGrid improved life for her as a scientist.

I participated in a very interesting panel session on the use of clouds and a session where attendees discussed the EGI roadmap going forward. It was interesting to function as an observer and consider our own planning activities.

Aaron Golden from the Albert Einstein School of Medicine in New York City gave a very NYC-paced presentation on the Einstein Science Gateway for genome sequencing on campus. The college is both a teaching and a clinical center, seeing 80,000 patients per year. Aaron’s background is in astronomy and it was interesting to hear the parallels between his work on the Sloan Digital Sky Survey and the Einstein gateway. While they both involve web interfaces to large data collections, as Aaron points out there is only one universe while in biology the universe is re-created each time a sequencer is run. It’s funny how Aaron traveled from NYC and I traveled from San Diego to Amsterdam so I could learn how successful his group has been using XSEDE with little assistance from my team.

Finally, I was excited to learn that the High-Performance Computing Infrastructure for South East Europe’s Research Communities (HP-SEE) has two BlueGene machines among their 20,000 combined cores. Truly, I learned a lot at this workshop and enjoyed my time on a small houseboat in Amsterdam.

