When delicious.com (or, to give it its ‘proper name’ for stubborn stalwarts like me, del.icio.us) arrived in 2003, the Web was already growing at an unbelievable rate. In 2000 there were a billion pages; by 2008, five years after del.icio.us’s inception, there were a trillion. Toolmakers, eager to solve problems in this age as in any other, were quick to provide solutions. Alongside improved search algorithms provided by Google and others, individual users of del.icio.us could curate their own ‘Web travelguide’, saving and signposting points and pages of interest by ‘tagging’ them however they liked – perhaps in categories relating to their hobbies, what they found funny, or were passionate about. The more socially-minded would carefully choose the tag words so that others could find them, and in this manner ‘social bookmarking’ was a major leap forward, building some of the foundations of the Web 2.0 era.
 For the record, I also believe box.net was better than box.com, despite it actually being the same thing.
In the same way, scientists, data scientists and e-infrastructure engineers have been thinking hard about how to add value to data by making it useful to others in the future. Data should be tagged to make it findable. But exactly how should it be tagged? Tags have to be flexible and dynamic to reflect the unpredictable nature of scientific research, but they have to follow standards, otherwise they’re little use to anyone else. How many of us at our first attempt at implementing a filing system end up with lots of similarly-named folders, each containing a single item, perhaps accompanied by a bulging folder called ‘misc’? Without standards developed with the experience of people who work with information and its management – librarians who have embraced the digital age – big data could end up being an incoherent, unwieldy mess, just like those first forays into filing. It’s perhaps not so ‘much catch-the-wave’ as ‘avoid the sea spray’ – all while a tsunami looms on the horizon.
“Without supporting tools, data isn’t data,” said Ross Wilkinson of the Australian National Data Service in the afternoon session on metadata; “—it rots! It needs to be made available [through e-infrastructures and computing resources] and it needs to be enhanced by making in available in alignment with other datasets.”
|Metadata marsupial, the possum. (CC-BY Wolombi, Flickr)|
One area that definitely needs a robust approach to metadata is medicine, not only because of the rich terminology of biology, but because clinicians often like to see information in a diagrammatical format. This has presented problems for clinical metadata, because it’s harder to grep in a graphic than in a text file. Bernard de Bono of the European Bioinformatics Institute presented one solution, ApiNATOMY, which automates the creation of standard anatomy schematics and metabolic maps and allows the inclusion of metadata. It’s the standardization of the approach that those behind the project makes it suitable for multiscale anatomy analytics.
But what language will metadata be in? EUDAT is a project concerned with European data infrastructure, so it could be any one of 23 official languages recognised by the EU. Speaking in the plenary, Director of the Finnish IT Centre for Science, Kimmo Koski revealed that the standard language agreed on would be English – an important step towards European data standards.