Saturday, July 27, 2013

Biosciences Day at XSEDE'13

After the excitement of the XSEDE’13 kick-off next door to Comic Con, and the glamour of a 1940’s themed Chair’s dinner aboard the USS Midway, we moved into Biosciences Day. Touring the Midway, we squeaked across the blue and white checked lino of the mess deck (hearty hot meals were served 23 hours out of every 24) and ducked into the cramped confines of the sick bay with its three tier stacked bunk beds. The operating theatre and dental surgery were right next door to the machine shop. To the untrained eye, the equipment in all three looked broadly similar. (Given the gender bias of the crew, I’ll leave the identity of the two most requested elective operations to your imagination. They weren’t on the teeth). The basic nature of the medical treatment on offer to the crew however was a timely reminder of how far medicine has come in the last half century, particular now that high performance computing has joined the medic’s armoury.

David Toth, a Campus Champion at the University of Maryland Washington, talked us through the role XSEDE resources have played in finding potential drug treatments for histoplasmosis (a fungal infection), tuberculosis and HIV. In the days of the USS Midway, crew with a positive HIV test were immediately air lifted to shore, with a future behind a desk rather than at sea ahead of them. Toth’s group screened 4.1 million molecules using XSEDE resources, a task that would have taken 2 years on a single desktop PC. Some drug candidates are now being tested in the wet lab, with several promisingly showing the power to inhibit cell growth. “Supercomputers do a great job of outpacing the biologists,” said Toth. One enterprising feature of the work was that students each screened 10,000 molecules as part of their course, with one student striking lucky with a potential drug candidate.

Looking at biomedical challenges more broadly, Claudiu Farcas from the University of California in San Diego summarised some of the issues posed by data. For a start, there are many different kinds of data, from the genotype, RNA and proteome, on to biomarkers and phenotypes, right up to population level data, all with their own set of acronyms. Public repositories are often poorly annotated and are mostly unstructured as well as governed by limited and complicated data use agreements. “Researchers are encouraged to share but are not always enabled to do so,” warned Farcas.

A particularly thorny issue for biomedical data analysis is how to deal with sensitive personally identified information (PII). Researchers need to protect the privacy of patients by anonymising their data. It also needs to be cleaned up, compressed and aggregated so it can be analysed efficiently. But how best to do this? Bill Barnett, of Indiana University said earlier that biologists really don’t care what they use, as long as it works. Cloud computing can be tempting, but institutional clouds are often still at the early stages of being set up, and commercial sources might have “questionable” privacy protection.

The iDASH (Integrating Data for Analysis, Anonymization and SHaring) portal allows biologists to focus on the biology, not the data. It includes add-ons such as anonymisation of data, annotation, processing pipelines, natural language processing, and the possibility to preferentially select speedier GPGPU resource. According to Farcas, iDASH offers a secure infrastructure plus a platform for development, services and software, including privacy protection.

No comments: