Generation of definitions

To shortly summarize the BARR task 2, the core task of BARR2, the manual annotation process requires providing for all the abbreviations manually labelled in the clinical case texts its most probable correct definition (or long form). The most probable definition requires domain experts to read the context (clinical case) and, depending on the clinical context assign the most probable and commonly used definition.

This definition annotation process relies on several steps, which include:

  1. based on the expert domain knowledge of the annotator, in case it is totally clear, provide the correct definition directly.
  2. checking whether the actual clinical case already provides the potential correct candidate definition in the text.
  3. searching the potential correct definition for a particular abbreviation in the corresponding full text article of the clinical case.
  4. searching for abbreviations (SF-LF) derived from a large collection of Spanish medical articles (BARR-2017 document collections, SCIELO, IBECS,..).
  5. searching for candidate definitions using the original abbreviation as query input in the “Diccionario de Siglas Medicas”.
  6. searching in other online abbreviation resources available such as ALLIE and Acromine.
  7. using online searches in Google with the abbreviation as query and if necessary some additional keywords that might return abbreviation definitions for the candidate abbreviation.
  8. in some cases, specially when several alternative definitions or definition variants are found the domain expert might use the number of total hits returned in internet searches to decide which is the most prevalent definition variant.

Note that in general the annotation process tries to focus on the actual “best/most adequate” definition, that is not only the correct long form but also the most appropriate language of the abbreviation definition. In case the definition of an abbreviation can be correctly written in Spanish, the preferred abbreviation definition is in Spanish, but if based on abbreviation letters and the practical usage the definition is not Spanish (often English), the non-Spanish definition will be annotated.

We require detection of all abbreviations, medical or not medical.


  • June 21th, 2018: BARR2 final test set clinical cases revealed.
  • May 28th, 2018: BARR2 background and test sets released.
  • May 25th, 2018: BARR2 development set released.
  • May 18th, 2018: BARR2 evaluation script released.
  • May 17th, 2018: BARR2 training set released.
  • May 8th, 2018: BARR2 announcement at MultilingualBIO Workshop (LREC 2018).
  • April 20th, 2018: BARR2 sample set released.
  • March 15th, 2018: BARR2 track website launched.