Biomedical Abbreviation Recognition and Resolution 2nd Edition (BARR2)

IberEval 2018 | SEPLN 2018

18 September 2018. Seville, Spain

Description

Healthcare professionals do face serious time constraints when carrying out their daily workload. Moreover, a considerable fraction of medical and clinical terms, including names of diseases, symptoms or even clinical tests, correspond to long complex phrases. Consequently, abbreviations and acronyms, hereafter referred to as short forms, are widely encountered in clinical reports. This is true even for cases of highly ambiguous short forms that could potentially have many different senses.

Clinical short forms, as a result of their underlying semantic complexity, are problematic. Correct interpretation of short forms is not only difficult for patients, but often represent a challenge even for medical professionals and health documentation experts.

In order to work well, clinical language technologies and information retrieval systems do require prior identification and resolution medical short forms. In practice, short forms are used in clinical texts to denote all kinds of key concepts, including diseases and adverse effects, symptoms, medications, treatments or temporal and anatomical expressions or routes of administration of drugs.

This topic was widely researched in English and is reviewed by Torii et al. (2007) [1]. Among the used strategies to address this problem are alignment-based approaches described by Schwartz and Hearst (2003) [2], machine learning approaches tested by Chang et al. (2002) [3] , or rule-based approaches used by Ao and Takagi (2005) [4].

Lexical resources for biomedical abbreviations in Spanish include for instance the “Diccionario de Siglas Médicas” of the Ministerio de Sanidad y Consumo of Spain [5]. In case of English biomedical texts, several manually annotated corpora have been constructed, i.e. the MEDSTRACT, Ab3P, BOADI and Schwartz and Hearst corpora (see Islamaj Doğan et al., 2014 for more details) [6,7].

To address computationally the medical abbreviation resolution problem in Spanish, there is a pressing need to access clinical corpora annotated with short form information and of short form clinical sense inventories.

The proposed track, the Second Biomedical Abbreviation Recognition and Resolution (BARR2) track relies on the experience and success of the previous BARR track [8], by extending annotations of short forms to clinical text and clinical case studies.

While the previous BARR track was focused on biomedical literature in Spanish, the BARR2 track has the aim to promote the development and evaluation of clinical abbreviation identification systems by providing Gold Standard training and test corpora manually annotated by domain experts with abbreviation-definition pairs within abstracts of clinical texts and clinical case studies written in Spanish.

The BARR2 track will be structured into two subtasks, namely:

  • Sub-track 1: asking participating teams to provide systems able to detect only explicit occurrences of abbreviation-definition pairs.


  • Sub-track 2: provide resolution of short forms regardless whether its definition is mentioned within the actual document.


An illustrative example case for sub-track 1 can be found in the following sentence:

Se describe la relación entre diferentes factores de riesgo cardiovasculares (FRVC) y la obesidad a partir de una muestra representativa de la población adulta de Madrid

In this example we consider 'FRVC' to be the abbreviation, and 'factores de riesgo cardiovasculares' is its definition. The BARR2 track requires both the recognition of <Abbreviation,Definition> candidate pairs from sentences, and identification of exact string boundaries.

An illustrative example case for sub-track 2 can be found in the following sentence:

Varón de 82 años que ingresa para realización de CPRE por un cuadro de colangitis aguda.

In this example we consider 'CPRE' to be the abbreviation, the actual definition or long form is not provided in the document and would correspond to 'colangio-pancreatografía retrógrada endoscópica.

In line with some of the previously proposed resources we refer to an abbreviation as -a ShortForm (SF)- that is a shorter term that denotes a longer word or phrase. On the other hand, the definition -the LongForm (LF)- refers to the corresponding definition found in the same sentence as the SF.

The expected outcome of this task includes strategies and resources for the development and evaluation of abbreviation recognition methods in Spanish biomedical documents. The expected target audience of this task includes developers of natural language processing tools interested in the application of such resources to biomedical documents. Moreover the generated resources will result in valuable resources for the disambiguation and grounding of abbreviations in health related document collections and might serve as a sense inventory for medical abbreviations medical dictionary resources.

In order to carry out this task we will release the BARR2 corpus, consisting in a manually labeled collection of Spanish medical abstracts constructed using a customized version of AnnotateIt, BRAT as well as using the Markyt annotation system [9]. This corpus comprises a selection of medical abstracts, clinical case studies and clinical record sentences. The BARR2 corpus will be structured into a training and a test set, each manually labeled with their corresponding <Abbreviation,Definition> offset annotations by a team of domain experts. The primary evaluation metric used for the BARR2 track will consist in precision, recall, y f-score of the predictions against manual gold standard. A larger background set of additional automatically labeled Spanish clinical case reports will be released together with the BARR2 corpus.

References

  1. Torii, M.; Hu, Z.-z.; Song, M.; Wu, C. H. & Liu, H. A comparison study on algorithms of detecting long forms for short forms in biomedical text. BMC bioinformatics, 2007, 8 Suppl 9, S5
  2. Schwartz, A. S. & Hearst, M. A. A simple algorithm for identifying abbreviation definitions in biomedical text. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2003, 451-462
  3. Chang, J. T.; Schütze, H. & Altman, R. B. Creating an online dictionary of abbreviations from MEDLINE. Journal of the American Medical Informatics Association : JAMIA, 2002, 9, 612-620
  4. Ao, H. & Takagi, T. ALICE: an algorithm to extract abbreviations from MEDLINE. Journal of the American Medical Informatics Association : JAMIA, 2005, 12, 576-586
  5. Laguna, J. Y.; Cuñat, V.A. Diccionario de Siglas Médicas. Ministerio de Sanidad y Consumo.
  6. Islamaj Doğan, R.; Comeau, D. C.; Yeganova, L. & Wilbur, W. J. Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC- formatted corpora. Database : the journal of biological databases and curation, 2014, 2014
  7. Sohn, S.; Comeau, D. C.; Kim, W. & Wilbur, W. J. Abbreviation definition identification based on automatic precision estimates. BMC bioinformatics, 2008, 9, 402
  8. Intxaurrondo, Ander, et al. The Biomedical Abbreviation Recognition and Resolution (BARR) track: benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts. Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval), 2017: 230-246.
  9. Pérez-Pérez, M., Pérez-Rodríguez, G., Rabal, O., Vazquez, M., Oyarzabal, J., Fdez-Riverola, F., & Lourenço, A. The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge. Database, 2016, baw120.
  10. Marimon, Montserrat, et al. Annotation of negation in the IULA Spanish Clinical Record Corpus. SemBEaR 2017 5.36.41 2017: 43.

Contact

Ander Intxaurrondo
ander.intxaurrondo[AT]bsc.es

News

  • June 21th, 2018: BARR2 final test set clinical cases revealed.
  • May 28th, 2018: BARR2 background and test sets released.
  • May 25th, 2018: BARR2 development set released.
  • May 18th, 2018: BARR2 evaluation script released.
  • May 17th, 2018: BARR2 training set released.
  • May 8th, 2018: BARR2 announcement at MultilingualBIO Workshop (LREC 2018).
  • April 20th, 2018: BARR2 sample set released.
  • March 15th, 2018: BARR2 track website launched.