Tuesday, October 23, 2012

Synergies and Tensions in data sharing

Observatory at Tibidado, Barcelona,
by Scalleja (Creative Commons)
Astronomy has a long history of archiving and sharing data: from written observations of the heavens, meticulously recorded in weighty tomes starting several hundred years ago, to modern-day citizen cyberscience projects such as Galaxy Zoo, which shares Sloane Digital Sky Survey images with the public and asks them to classify what they see. In so doing, it has created a new class of open scholars. So why not share and gamify all data? Is that the way to deal with the deluge we’ve all been told about?

Galaxy Zoo works well because an individual can, with minimal training, tell you what something looks like. This is something we discussed in our e-Science Briefing on visualisation. But there are many reasons why scientists, who have invested years of their lives in large-scale international collaborative projects, might not want to release everything immediately. Talking in this morning’s plenary, Bernard Schutz from the Max Planck Institute for Gravitational Physics in Potsdam, Germany, introduced LIGO, the international network of gravitational wave detectors. Stretching out at right angles along the ground in the scrubland of Hanford,  Washington state, 4km-long arms lie in wait to catch a wave – a series of ripples in the curvature of spacetime. Predicted by Einstein’s General theory of Relativity in 1916, we are yet to detect them with the LIGO in Hanford, or the similar devices dotted around the globe. Yet. In fact, the more advanced sensitivities needed to detect them are yet to be installed, so while that “is no surprise,” said Schutz, “it will be a surprise if we don’t see them after 2017,” when the advanced sensitives are installed and the number of countries with LIGO detectors jumps from 3 to 5. Having detectors all around the world will allow the location of the source of the waves to be triangulated, a necessary feature for detectors that are not orientable and point to no part of the sky in particular.

Pushing for open data, the US National Science Foundation (NSF) found themselves up against scientists who had invested a great deal of time working on LIGO, who would prefer to keep the data for themselves – at least for a time. They weren’t purely motivated by vested interests in their own careers (although who could blame them for that?), they were also worried about the fact they’d have to explain to well-meaning members of the public that what they thought was a gravitational wave “was an artefact,” said Schutz. Whereas Galaxy Zoo is relatively simple, LIGO data requires years of study before an individual is able to know what to look for and what to ignore. In the end, the NSF allowed a two year proprietary period for scientists actively involved in LIGO to get to grips with the data.

Still, “it won’t be long until members of the public will be included on author lists,” said Schutz. Volunteer computing through systems such as BOINC allows project to curate data (to some extent) before sending out out to be processed on PCs in homes across the globe. Such a large amount of data will be generated by instruments like the Large Synoptic Survey Telescope in Chile that, rather than worrying that ‘others’ might analyse the data first (as the LIGO members did), “the worry is that it won’t be!” exclaimed Schutz. In fact, Google will be making up to 30TB of data per night available as an interactive sky map. This drew questions regarding the placement of data from publicly funded research in the hands of a private corporation, even one that won’t ‘be evil’, to paraphrase Google’s own mantra. Of course, they are making it available for (cost-)free.

There are synergies and tensions involved in data sharing, whether we focus just on astronomy, or beyond. This provides challenges, but it also provides opportunities for new ways of interacting. The public/private issue will not go away soon, just as the proprietary/open divide will remain for some research disciplines. “Much richer metadata and data identifiers would address some concerns,” said Schutz, “and we have to have standards, even as they evolve.”

No comments: