Friday, February 24, 2012

Cloud Policy, Big Data and the Long Tail of Science

Nearly every academic research field is a data-rich science field but the majority of researchers still don’t have access to cloud resources. Clouds can help democratise science for this ‘Long Tail of Science’, but how do we make that happen? David Gannon from Microsoft Research described some of the work they are involved in, and outlined some of the policy challenges faced by the research community in adopting cloud technologies.

In 2010, Microsoft Research extended the realm of cloud and provided access and support for smaller academic research projects through the Global Cloud Research Engagement project.  Microsoft Research builds the components of cloud technology and works with researchers in the field on projects. There are currently 83 projects globally, which will shortly be extended to a further 17 projects in South East Asia.

Whilst traditional HPC belongs on supercomputers, the advantages of the cloud model for researchers is in access to applications such as web access and sharing, massive ‘map reduce’ data analytics on cloud resident data and massive ensemble computations. The main advantage being that academics  can get access when they need it, and there is no additional maintenance costs.

However, researchers still face challenges using the cloud, and David Gannon outlined at the end of his talk some of the substantial challenges that lie ahead, mainly from a policy perspective. 
  • The CAPEX funding model. Most funding systems are set up for capital costs that require physical computing infrastructure to be purchased. Cloud computing services are consumed as needed and don’t fit well into science grant budgets system.
  • Data preservation and sharing. Collectively the 'Long Tail of Science' is generating a lot of data.  In the US, the National Science Foundation requires all data to be made public, and universities are struggling to cope with this new load.
  • Data sovereignty. This raises many questions: ‘Can I store my research data in another country?’ and ‘what laws apply to you and your data?’
  • Data convergence which provides great opportunities as well as risks. Streams of data from satellites, economic markers, weather, personal media, genomics and medical data and geo sensors will converge in the cloud and can help solve problems e.g. tracking disease or optimising the performance of our economies. However, there are significant risks, such as security issues that infrastructure providers need to plan ahead for. For example, basic research programs on security related algorithms are needed e.g. homomorphic encryption techniques (where a data owner can grant access to different entities for different users but the cloud provider has no access). 

No comments: