Thursday, October 8, 2009

Weaving together language archives and technology

--Guest posting by Bob Jones, EGEE project director

Europe has an abundance of digital archives that are very interesting to social science and humanities researchers, but the problem is access, access, access. The solution?

One of the ESFRI projects present at the Enabling Grids for E-sciencE conference in Barcelona last month, CLARIN (Common Language Resources and Technology Infrastructure) is a large pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable. CLARIN will give scholars tools for computer-aided language processing. It aims to

(1) unite existing digital archives in Europe that contain language based material;
(2) make language and speech processing tools available to interested researchers, opening up new research avenues;
(3) give web based services to non-expert users (especially humanities and social sciences researchers without a technological background), making complex tasks possible for materials contained in the archives, such as ‘Summarize Le Monde of March 17 2008 — in Polish’. [Example from first issue of CLARIN’s newsletter.]

I attended the Networking Event for European Research Infrastructure (NEERI09) in Helsinki, 1-2 October 2009: This workshop focused on planning for CLARIN’s technology components. I noticed similarities between the classifications of CLARIN data centres and the tiered system of the World LHC Computing Grid. If this holds true, the it is possible that CLARIN centres could operate within the frame work of the National Grid Infrastructures that will be part of the European Grid Initiative (EGI) and that CLARIN users could be supported through the proposed humanities specialized support center (SSC).

After presentations and discussions about open access to research material and copyright issues, CLARIN representatives agreed to prepare a proposal for a common approach to open access to copyrighted material across the ESFRI projects.

It is excellent to see the talks with ESFRI projects from EGEE’09 already yielding real collaboration!

- - - - - - - -

Note, some of the interesting points for EGEE/EGI about CLARIN:

  • This community needs ‘Persistent Identifiers’ (rather than temporary URLs) and metadata. A couple of consortia across Europe and the USA are making strides in developing services for registering and storing Persistent Identifiers (PIDs) based on handles. If these services prove useful to projects such as CLARIN then EGI and its grid middleware will have to be able to use PIDs if it wants to work with and support this community.
  • eduGAIN is being considered as the basis for Authorization and Authentication Infrastructure by CLARIN and related ESFRI projects. Interoperability of such identity federations with the AAI federations in EGI will be necessary.
  • The Direct User Support groups of EGEE/NA4 are in contact with CLARIN to help them prototype use of grid facilities for a specific use case.
  • The operation and support models of EGEE/EGI and DEISA are looking more similar which should simplify the steps necessary for interoperation.
  • A white paper “Strategy for a European Data Infrastructure” ( prepared by PARADE (Partnership for Accessing data in Europe) was distributed by Kimmo Koski at the end of the event.
You can see my slides from the event covering EGEE, EGI and how CLARIN’s needs could map onto what we have today (link: .

No comments: