|Delegates at CloudscapeV|
As more and more scientific disciplines become data hungry, and a rising number of social scientists (economists, e-humanities researchers) follow suit, governments and international bodies are starting to commit to developing a framework that allows convenient, secure and intelligent access and exchange of data. During the "Open Collaborative Models, Open Data, Big Data' session at CloudScapeV, a number of pioneering initiatives showed how cloud computing is opening-up big data for research, society and business.
During the keynote speeches, Carlos Morais Pires, Scientific Officer for the “Scientific Data e-Infrastructures” from the ECs DG CONNECT, said the vision for “global research data infrastructures” involves overcoming barriers in realising the importance of data sharing for next century science. But the vision is also hindered by those waiting for standards to be established for enabling data sharing and interoperability for the entire data life cycle. Pires commended the efforts of the Research Data Alliance (RDA), who are steering the international research community to gather user recommendations for infrastructures, policy, practice and standards. The RDA is holding its first meeting towards the end of this month.
A burgeoning partnership between data providers and scientific users is being encouraged by another international organization. EUDAT has over the last eighteen months been fully engaged in gathering user requirements to assemble the 'building blocks' for a pan-European data infrastructure to complement the EGIs grid infrastructure and GÉANT's network infrastructure.
The long term view for EUDAT is that the resource provider could be any cloud storage provider regardless of location. However automated replication across boundaries could be problematic, says Baxter. This is especially pertinent for medical data, copyrighted data and unique data (digital art), which can all be affected by different EU data protection laws. "Building collaborative data infrastructures and storage clouds for research data is not just requirements/technical problem, but increasingly about managing policy restrictions automatically and harmonizing legal frameworks", concluded Baxter.
The question remains how do you nuture an open data culture for that drives innovation and economic growth. This is one of the goals of the independent non-for-profit, Open Data Institute (ODI), which was officially opened last September (one of the founders was Sir Tim Berners-Lee). GridCast interviewed Stuart Coleman, the Commercial Director at the Open Data Institute, at CloudscapeV, to find out more about some of the services offered by the ODI. This includes mentoring data-driven start-ups, as well as training researchers and journalists in data literacy. Coleman says that "often the opportunity for people to innovate with data is constrained by the fact that lots of documents are released which could deliver more value as data".
|Stuart Coleman at CloudscapeV|
One case study (OpenCorporates) on the ODI website, has with a small staff of two, managed to collate data from 51 million companies. Their aim was to an open database for the corporate world (similar to OpenStreetMap) connecting and adding clarity to corporate data. "Many people buy data from other sources with a warranty that gives them a certain level of accuracy. In an open environment, people are constantly motivated to contribute to that data and to share-alike," says Coleman."If you can identify where the data comes from, you can trust that data as being more authoritative than a closed source." The healthcare industry has also benefited from this transparency opening up prescription data. The site, prescribing analytics is providing an insight into GP prescribing of cheaper generic vs more expensive branded drugs, indicating where cost efficiencies could be made (e.g. over £200 million).
Coleman says understanding is the main barrier and this is where case studies and education are vital. What might be useful is a data-roadmap to define what it the most valuable data and why. He points out that some data known as core-reference data, which is data when combined with other sources can be particularly useful again and again.