Annotation guidelines are available in Zenodo.

DISTEMIST training, test and background sets are available at Zenodo.


The training dataset consists of 750 annotated clinical cases. The annotations can be accessed via a tsv file with the following fields:

  • filename: document name
  • mark: identifier mention id
  • label: mentions type (ENFERMEDAD)
  • off0: starting position of the mention in the document
  • off1: ending position of the mention in the document
  • span:  text span

In addition, txt files will be provided for each of the clinical cases in order to access the context of each mention and to train the automatic system.


The test dataset consists of 250 clinical cases.

It is published together with a larger collection of 2750 background clinical cases, to avoid manual corrections. You have to make predictions for the 3000 clinical cases and you will be evaluated in the 250 that belong to the test set.