Thursday, March 21, 2013

Capturing that long tail of data for e-researchers...

Many humanities and social science researchers (SSH) who are busy digitalising their work are now also turning to  e-research/e-science to make sense of their growing data. On Tuesday afternoon, there was a session at ISGC2013 to discuss some of the challenges and common problems between communities, as well as a chance to swop potential solutions. More typical for researchers in the humanities and social sciences is 'semi-big data', suggests Prof. Peter Doorn who is the Director of Data Archiving and Networked Services (DANS) in Netherlands. Around 98% of data in the social sciences and the humanities is smaller than a gigabyte. It is clear that storing big data isn’t the problem for most of these projects, the more pertinent question is how can researchers efficiently archive and process the data in order to enhance collaborative work for the long tail. 

Peter also raised some other challenges for the large SSH community:
How do we provide secure and easy-to-access to data (privacy and commercial interests)?
As big data sources are not intended for research, how do we match traditional research collections to this new big data?
Addressing the methodological-how to match theory-driven questions to data driven possibilities?
Finally, there are size and computational challenges-scholars in the humanities and social sciences are working in cooperation with e-scientists for the long term archiving of big data

Another important point was raised during the discussion about descriptive metadata for the humanities being more diverse than for High Energy Physics or the 'big data' sciences. 

After my short presentation on how our project (e-ScienceTalk) is assessing the impact of our communications for e-infrastructures, Prof. Roberto Barbera described some of the work being carried out by  Digital Cultural Heritage Roadmap for Preservation project (known by the acronym DCH-RP). This 2-year European funded project started in October 2012 and is examining policies and testing models, examining governance, sustainability and community engagement strategies for the cultural sector, with the intention of expanding to help cultural e-researchers outside of Europe in two years time.

What the project intends to do is to develop best practices and a roadmap for the sustainable long term and short term preservation of digital cultural heritage. The idea is to provide both practical tools and to pilot a eCulture Gateway at various culture institutes ("a proof of concept") that will feed into a workable roadmap. It is EGI that is leading this piloting stage. There is a workshop at the EGI Community Forum next month (8-12th, April, Manchester).

Balancing security needs with accessibility is one of the areas that is being closely researched by all those building open and shared research infrastructures in both science and the humanities. If you want to read more about about security in e-science, check out our latest e-ScienceBriefing 'The Security Issue' which also features Roberto.

During his talk, Roberto mentioned the recent TERENA study which collected together the opinions from different communities on authorisation and authentication (see page 18 for main findings). A federated system was very important for most researchers. Nearly half of researchers use more than one credential, but a large majority expressed the preference to access all resources using their institutional credentials. However, a significant percentage (19%) used their social network credentials e.g. a community page on Facebook, to access scientific information.

No comments: