 Related activities: Cross-Language Evaluation Forum

The Cross-Language Evaluation Forum (CLEF) aims at promoting research and development in Cross-Language-Information Retrieval (CLIR).

CLEF aims at

  • providing an infrastructure for the testing and evaluation of information retrieval system operating on European languages, and
  • creating test-suites of reusable data which can be employed by system developers for benchmarking purposes.
The CLEF 2001 evaluation campaign has attracted an increasing number of participating groups (coming from European countries, North America and Asia), in total 31 groups delivered results. Several new groups especially from Europe have been participating. At the workshop the results of the CLEF 2001 campaign have been resumed, also the methodology and functionality of CLIR systems and the future work of CLEF campaign have been discussed.

The submitted results of the tests carried out by the participants show the steady improvement of the overall results, but they also show still a lot of difficulties to overcome language-specific problems like resolving compounds and ambiguity of terms. Several groups tried to use automatic translation tools like Systran, Power Translator or Babylon, which performed not bad, but failed totally in some cases: mainly in cases of proper names, which have natural language equivalents, in cases of idiomatic phrases, homonyms or hyphenated compounds. This means in the end that it is not sufficient only to use machine translation systems, there are additional components necessary to avoid the mentioned problems.

Another strategy of shaping CLIR systems was to combine the results of different machine translations systems to achieve a better retrieval results; this strategy really worked well. Other groups combined different methods for solving the translation problem, in these cases the results improved, too. An increasing number of groups used corpus-based, statistical approaches.

In most tests the results are quite good in an overall perspective (although they do not reach the results of monolingual retrieval), but they have specific problems in some cases (which differ between the groups and depend on the used strategies). The results also suffer from quality problems (of dictionaries, corpora, language tools, and machine translation systems) with “less popular” language pairs (like German/Italian) or languages.

The proceedings of last year (CLEF 2000) have now been printed.

At the main ECDL 2001 conference the papers were grouped around the following aspects of research in the digital library area: user modeling and user communities, digitisation and multimedia, knowledge management, information retrieval and multilinguality. In the multilinguality session the papers dealt with CLIR in Japanese and English and with the handling of character sets. With respect to the work of the ETB the paper on Learning Spaces in Digital Libraries could be of interest which also deals with the application of a metadata set on geospatially-referenced learning resources (Coleman et al.). The proceedings have been printed in advance of the conference.


Panos Constantopoulos, Ingeborg T. Sølvberg (eds.) : Research and Advanced Technology for Digital Libraries, 5th European Conference, CEDL 20001, Darmstadt, Germany, September 2001, Proceedings. Berlin et al.: Springer 2001 (= Lecture Notes in Computer Science (LNCS), 2163)

Anita S. Coleman et al. : Leraning Spaces in Digital Libraries. In: Constantopoulos/ Sølvberg (eds.) : Research and Advanced Technology for Digital Libraries, 5th European Conference, CEDL 20001, Darmstadt, Germany, September 2001, Proceedings. Berlin et al.: Springer 2001 (= LNCS 2163), p. 251-262

Carol Peters (ed.): Cross-language Information Retrieval and Evaluation. Workshop of the Cross-Language Information Evaluation Forum, CLEF 2000, Lisbon, Portugal, September 21-22, 2000, Revised Papers. Berlin et al.: Springer 2001, (= Lecture Notes in Computer Science (LNCS), 2069 )

Author: Michael kluck
Web Editor: Riina Vuorikari
Published: Tuesday, 11 Dec 2001
Last changed: Tuesday, 11 Dec 2001
Keywords: thesaurus, standardisation, multilinguality

