Download the annotation guidelines from Zenodo.

The LivingNER corpus was manually annotated by clinical experts following annotation guidelines specifically created for this task. These guidelines contain rules for annotating species and infectious diseases in clinical cases in Spanish. Infectious diseases are not included in this task. Additionally, they also include some considerations regarding the codification of the annotations to the NCBI Taxonomy.

Guidelines were created de novo in three phases:

  1. First, a zero version of the guidelines was developed after annotating an initial batch of ~40 clinical cases and outlining the main problems and difficulties of the data.
  2. Second, a stable version of guidelines was reached while annotating sample sets of the LivingNER corpus iteratively until quality control was satisfactory.
  3. Third, guidelines are iteratively refined as manual annotation continues.

The annotation guidelines are available in Zenodo.

Post-annotation review steps:LivingNER Annotation Guidelines

LivingNER corpus post-annotation review steps:

  • Consistency review: all annotations were searched for occurrences in all documents and a clinical expert reviewed whether they should be added to the annotations.
  • False positives: all annotations for the entities “neonatología”, “personalidad”, “cocaína”, “sociofamiliares”, “politraumatizado” were eliminated.
  • False negatives: the occurrences in the text of “prion”, “contacto sexual”, “oportunistas”, “enfermedades oportunistas”, “fascitis necrosante”, “fascitis necrotizante”, “probióticos” were reviewed and revised if they should be annotated because they are very important.
  • Consistency in the annotation of labels: there are mentions that sometimes are SPECIES and sometimes ENFERMEDAD and they were reviewed.
  • Validation of standardization: all codes were checked to ensure that they were in the official version of NCBI Taxonomy.
  • Consistency of standardization: there are some mentions that have different codes depending on the context. These were reviewed.
  • Review of unmapped entities: we reviewed all mentions without codes.
  • Checking internal line breaks: annotations with line break characters inside them were removed as they span more than one line.
  • Annotation starting and ending characters: all annotations are checked to ensure that they start and end with an alphanumeric character or a parenthesis. For example (“,adenocarcinoma,” would be an erroneous annotation since it is surrounded by commas).