Tuesday, January 5, 2010

David de Roure talks data

David de Roure from the University of Southampton is the Chair of OMII-UK and was recently appointed National Strategic Director of e-Social Science towards the end of last year. We met him at the All Hands/IEEE e-Science 2009 conference in December where he told us about his new role.

Can you tell us about your work as National Strategic Director of e-Social Science?

e-Social Science is a programme set up by the ESRC (Economic and Social Research Council) - it’s e-science meets the social sciences.

The programme had two phases.In the first phase, the "hub" in Manchester very successfully set up a number of projects, called nodes, which had a role in terms of community interaction and working with the other nodes. What we need to do now is get those new techniques that have been established within those nodes out to the broader social science research community. That’s my mission and I have a great team to help me.

The three things I’m really promoting are an approach based on methods, based on international [ways of working] and on the next generation of social scientists. My way of doing this is to work as a partner for projects that are both inside and outside the e-social science programme. Sitting down with people, travelling around the country, visiting other projects has been great. We’re working together with them to get these methods out.

What do you see as some of the most important challenges the community faces?

If we can get the metadata right then data becomes a lot more requires changing culture and practice. People who are working with a particular piece of data might not have the incentive to make it reusable by others and that is a deeper problem. In the long term that’s something we need to fix - we need to make sure we have the reward structures.

At the moment the culture is – if people have [data] they don’t really get a reward for making it reusable for others. There's been a lot of discussion at the conference (e-Science 2009) about reward structures. We have a system about papers which doesn’t take into account the data, software, blog posts, workflows, all these other new things that come out of e-science and e-social science.

These issues are not just technical, they're social issues as well. There's a famous quote from Jim Gray that says 'may all your problems be technical' because we can deal with them. It's the social problems that are difficult.

We’ve heard you’re working on a project called SALAMI (Structural Analysis of Large Amounts of Music Information). Could you tell us a bit more about it?

For me a lot of my life is spent helping researchers in other disciplines - I've been doing that for 15 years and that's what I enjoy doing. It’s very important therefore that sometimes I am the researcher using the tools so I can see it from both sides.

[The SALAMI project] is to do with music and e-science but I'm on the researcher end of it. It came out of a programme called the ‘Digging into Data Challenge’ and it’s supported by four funding agencies across three countries - JISC in the UK, the National Endowment for the Humanities and the National Science Foundation in the US, and the Social Sciences and Humanities Research Council from Canada. They're putting together this programme that takes existing datasets and really does something with them. And I think that's very, very important. People are so good at collecting and curating data but I think we don’t spend enough time working out how to understand it and what to do with it - this project really addresses that.

In the SALAMI project for example, we're going to take all the music on The Internet Archive and thousands of supercomputing hours that have been donated to us from NCSA (US National Centre for Supercomputing Applications).  [We will then] analyse all the music to build a resource for musicologists, based on the structure of the music. It’s a great example of a programme – it's multi-disciplinary, it’s multi-country. Even just at the level of funding, working across that number of agencies with a peer review process that they’re all happy with, is quite an achievement.

In September last year you went on a fact finding mission to the US. Can you share some of your insights from the trip with us?

I went out with Malcolm Atkinson on a mission to find out about the use of research data - asking scientists who were actually using the data what they're current practices are, what’s going to happen in the future, what works and what doesn’t work.

We spent three and a half weeks seeing different institutions everyday and we came back with our heads spinning with information. One of the things that we learnt is that, in the US, the libraries are much more involved in e-science than in this country. For example they've funded two new datanet projects, DataONE  and the Data Conservancy, and there are many figures who come from the libraries and the repositories world.

Another discovery is that they have a real acknowledgement in the US that data and understanding data is supported. We’ve always been very data centric in UK e-science but I think we need to build up a more comprehensive understanding. If ten percent less money was spent on hardware and that money was put into training people in software and techniques for understanding data then actually we’d make more progress in e-science. That’s quite a controversial statement because there’s a lot of commitment to buying big computers and having the ones with the flashiest lights and the biggest number of flops. It’s a difficult thing to talk about because money going into software or training doesn’t necessarily take the same route as money going into hardware and building infrastructures. That shift is already occurring in the US - there’s increasing investment in data – so we feel strongly that we need more of this in the UK.

No comments: