IberLegal aims at the following:
- To release a corpus of open legislative and administrative (public procurement) texts in Spanish
- To evaluate Named Entity Recognition and Classification given the special characteristics in this type of text.
- The task will focus on five types: Mentions of laws and legislations, Organizations/Legal Entities, Persons, Places and Time Expressions.For our task we consider the recognition span and the type of the Named Entity.
For NE type, the methodology considers: correct and incorrect. For NE span, it considers exact match or partial match.
For recognition, the NE Recognition, the following scenarios are considered:
- Named Entity not recognized
- Named Entity and span recognized correctly
- Named Entity recognized partially, i.e. span is not correct (partial recognition or extra tokens are included)
- A segment is wrongly detected as Named Entity which is not in the Gold Standard.
For classification, the following scenarios are considered:
- Category correctly classified
- NE misclassified
Relevance and Novelty
Named Entity Recognition and Classification has been in the centre of Information Extraction and NLP shared tasks since 1990s with the series of the Message Understanding Conference (MUC). The first shared task on “Named Entity” dates back to 1995 within the MUC-6 promoted by NIST, the US National Institute for Standards and Technology.
Regarding the legal domain, TREC had a dedicated track, TREC Legal Track administered by NIST for evaluating the application of Information Retrieval (IR) methods to e-discovery in the context of U.S. civil litigation from 2006 until 2011 (Oard et al., 2010).
COLIEE (Competition on Legal Information Extraction and Entailment) is another important campaign with seven editions (Yoshioka, 2018). It is usually run within the major events dedicated to Legal Information Systems and Artificial Intelligence such as JURIX, JURISIN (Juris Informatics) or ICAIL (International Conference on Artificial Intelligence and Law). The editions have usually focused on English texts, although in the last editions Japanese texts were included. The above insights from TREC Legal Track and COLIEE reveals the need for promoting the work of NLP and IR in the legal domain in Spanish and we believe it is time to adopt a strong initiative led by the Language Technologies Plan and the Secretary of State for Digitalization and Artificial Intelligence to encourage industry and academia to embark on a systematic and competitive approach for processing legal text in Spanish