IberLegal, NLP​ for Legal Domain in Spanish: Named Entity Recognition and Classification in Legal and Administrative Texts.

IberLegal​ aims at taking the initiative in organizing an evaluation task targeting legal text processing in Spanish language. The task aims at encouraging NLP groups to process legislative and administrative texts and to tackle the challenges encountered in this type of texts. 

The overall aim of IberLegal is planned to be cross-sector bringing together stakeholders from Academia, Industry (NLP and Legal Industry), Public and Regional Administration. 

This aim is achieved through the following specific objectives: 

  • To release a corpus of open legislative and administrative (public procurement) texts in Spanish.
  • To evaluate Named Entity Recognition and Classification given the special characteristics in this type of text. The task will focus on five types: Mentions of laws and legislations, Organizations/Legal Entities, Persons, Places and Time Expressions
  • To release a corpus of open legislative and administrative (public procurement) texts in Spanish.
  • To evaluate Named Entity Recognition and Classification given the special characteristics in this type of text. The task will focus on five types: Mentions of laws and legislations, Organizations/Legal Entities, Persons, Places and Time Expressions.

Relevance and Novelty

Named Entity Recognition and Classification has been in the centre of Information Extraction and NLP shared tasks since 1990s with the series of the Message Understanding Conference (MUC). The first shared task on “Named Entity” dates back to 1995 within the MUC-6 promoted by NIST, the US National Institute for Standards and Technology. 

Regarding the legal domain, TREC had a dedicated track, TREC Legal Track administered by NIST for evaluating the application of Information Retrieval (IR) methods to e-discovery in the context of U.S. civil litigation from 2006 until 2011 (Oard et al., 2010).​  

COLIEE (Competition on Legal Information Extraction and Entailment) is another important campaign with seven editions (Yoshioka,​ 2018).​ It is usually run within the major events dedicated to Legal Information Systems and Artificial Intelligence such as JURIX, JURISIN (Juris Informatics) or ICAIL (International Conference on Artificial Intelligence and Law). The editions have usually focused on English texts, although in the last editions Japanese texts were included.  The above insights from TREC Legal Track and COLIEE reveals the need for promoting the work of NLP and IR in the legal domain in Spanish and we believe it is time to adopt a strong initiative led by the Language Technologies Plan and the Secretary of State for Digitalization and Artificial Intelligence to encourage industry and academia to embark on a systematic and competitive approach for processing legal text in Spanish

Iberlegal: a task within Iberlef 2020

IberLegal will be part of the IberLEF (Iberian Languages Evaluation Forum) 2020 evaluation campaign at the SEPLN 2020 36th Annual SEPLN Congress (September 23rd to 25th 2020, Málaga).

IberLEF aims to foster the research community to define new challenges and obtain cutting-edge results for the Natural Language Processing community, involving at least one of the Iberian languages: Spanish, Portuguese, Catalan, Basque or Galician. Accordingly, several shared-tasks challenges are proposed.

IberLEF 2020https://sites.google.com/view/iberlef2020/

SEPLN 2020: http://sepln2020.sepln.org/index.php/en/iberlef-en/

Background and Motivation

Within the Spanish National Plan for the Advancement of Language Technologies, a number of specific priority domains are selected to develop pilot projects. The Legal domain is one of these priority areas that was in the centre of the attention of the National Language Technologies Plan in the last two years given its relevance and its impact on society at the different levels: governmental bodies’ level, industry, academia, services for citizens, structural measures, etc. 

On the other hand, Spanish language is one of the top widely spoken language, but language resources in terms of corpora and NLP solutions is still limited given the wide potential it could have. Most of the language resources developed for the legal domain are mainly in English and there is still an opportunity to offer to the NLP and the legal community large scale resources in Spanish language.  NLP and Artificial Intelligence in the legal domain are gaining more and more momentum in the last few years. More work and more resources are being developed, but with a clear dominance of the English language. Therefore, it is time to launch initiatives to promote NLP in the legal domain for the Spanish language given that the impact is not only limited to Spain, but it could have an industrial uptake whose impact would reach Latinamerica and US given the language.