Scientific Committee

  • Kirk Roberts, School of Biomedical Informatics, University of Texas Health Science Center, USA
  • Parminder Bhatia, Amazon Health AI, USA
  • Irene Spasic, School of Computer Science & Informatics, co-Director of the Data Innovation Research Institute, Cardiff University, UK
  • Tristan Naumann, Microsoft Research Healthcare NExT, USA
  • Carlos Luis Parra Calderón, Head of Technological Innovation, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, Spain
  • Ashish Tendulkar, Google Research
  • Alfonso Valencia Herrera, Barcelona Supercomputing Center (BSC-CNS), Spain
  • Hercules Dalianis, Department of Computer and Systems Sciences, Stockholm University, Sweden
  • Kevin Bretonnel Cohen, Colorado School of Medicine, USA; LIMSI, CNRS, Université Paris-Saclay, France
  • Karin Verspoor, School of Computing and Information Systems, Health and Biomedical Informatics Centre, University of Melbourne, Australia
  • Aurélie Névéol, LIMSI-CNRS, Université Paris-Sud, France
  • Goran Nenadic, Department of Computer Science, University of Manchester
  • Antonio Martinez, Head Pathology, Director National EQAS GCP, Spanish Society of Pathology, SEAP-IAP
  • Zhiyong Lu, Deputy Director for Literature Search, National Center for Biotechnology Information (NCBI)
  • Mauro Oruezabal, Head of Medical Oncology Service, Hospital Universitario Rey Juan Carlos, Madrid, Spain

Task Organizers

  • Martin Krallinger, Text Mining Unit, Barcelona Supercomputing Center, Spain
  • Antonio Miranda, Text Mining Unit, Barcelona Supercomputing Center, Spain
  • Eulàlia Farré, Text Mining Unit, Barcelona Supercomputing Center, Spain
  • Jose Antonio Lopez-Martin, Medical Oncology, Hospital Universitario 12 de Octubre; Instituto de Investigación Hospital 12 de Octubre (i+12), Spain

Description of the Corpus

Annotation guidelines can be downloaded from Zenodo.

Cantemist train, development, test and background sets are already available at Zenodo

Corpus format

                                    Figure 1. Example Brat annotation for Cantemist-Ner.

                                    Figure 2. Example Brat annotation for Cantemist-Norm.

  • Subtask CANTEMIST-CODING: CodiEsp format. We provide a single plain text file per clinical case and a tab-separated file with all the unique codes per clinical case (see Figure 3).

                                    Figure 3. Example tab-separated file for Cantemist-Coding.

General information

For this task professional clinical coding experts have annotated a corpus of clinical cases in Spanish with eCIE-O-3.1 codes using the BRAT annotation tool following well-defined annotation guidelines adapted form the clinical coding recommendations published by the Spanish Ministry of Health, after several cycles of quality control and annotation consistency analysis before annotating the entire dataset. Figure 2 shows a screenshot of a sample manual annotation generated using the BRAT annotation tool.

Description: Macintosh HD:Users:mkrallinger:Desktop:shared_tasks:CANTEMIST_iberlef:Ejemplo1 2.png

                                    Figure 2. Example BRAT annotation with labeled tumor morphology entity mention.

The CANTEMIST corpus consists of a collection of 3000 clinical cases that will be distributed in plain text in UTF8 encoding, where each clinical case would be stored as a single file. These clinical case reports were carefully selected to represent records reflecting as much as possible clinical narrative related to electronic clinical reports. Figure 3 illustrates an example text snippet corresponding to a short sample record.

                        Figure 3. Example plain text CANTEMIST corpus document

Additionally, we will also provide the annotation files comprising the character offsets of the tumor morphology entity mentions in TSV (tab-separated values) BRAT format together with their corresponding eCIE-O-3.1 code annotations.

The final corpus will be randomly split into three subsets: training, development and test. In the case of training and development sets, additionally, to the clinical cases, a TSV file will be released. It will contain one row per annotation. Each row will consist of the eCIE-O-3.1 code of the clinical case, a label indicating the category of the annotation, the annotation code and a reference to the text span that stimulated the annotation (the evidence).

In addition to the test set, a larger background set of clinical case documents will be released to make sure that participating teams will not be able to do manual corrections. In addition, the background set will become a silver standard of texts coded through automatic eCIE-O-3.1 code predictions returned by participating teams.

The goal of the CANTEMIST task is to develop automatic eCIE-O-3.1 clinical coding systems for Spanish medical texts. These systems should rely on the use of the CANTEMIST corpus, a high-quality Gold Standard synthetic clinical corpus of 3000 records based on a manual annotation process done by human clinical coding experts together with an inter-annotator agreement consistency analysis.

The CANTEMIST task can be approached as a named entity recognition and normalization task, but also as a multi-class text classification task. Participants are encouraged to either propose solutions in one of these directions or to combine both approaches. As well, novel approaches are welcomed.


Guide to submit your papers:

Following the setting of successful previous shared tasks we had organized in the past (e.g. MEDDOCAN/IberLef2020, PharmaCoNER/BioNLP-ST EMNLP or CHEMDNER/BioCreative) we will invite all teams sending a test set prediction submission to send a workshop proceedings paper on their system (systems description paper) to be published in the SEPLN/IberLEF workshop CEUR proceedings.

The proceedings of the previous IberLEF2019 are online at

Working notes format

The working notes style is available via our proceedings volume template at (we will use single-column format as in previous years).

Overleaf users can clone the style from

Offline versions for LaTeX and DOCX are available from”

Additionally we plan to prepare a journal special issue on the CANTEMIST task overview, corpus and results together with participating technical team systems descriptions in a Q1 journal.


Cantemist (CANcer TExt Mining Shared Task) will be part of the IberLEF (Iberian Languages Evaluation Forum) 2020 evaluation campaign at the SEPLN 2020

36th Annual SEPLN Congress (September 23rd to 25th 2020, Málaga)

IberLEF aims to foster the research community to define new challenges and obtain cutting-edge results for the Natural Language Processing community, involving at least one of the Iberian languages: Spanish, Portugueses, Catalan, Basque or Galician. Accordingly, several shared-tasks challenges are proposed.

IberLEF 2020:

SEPLN 2020:

Register for IberLEF and SEPLN here:

All Cantemist talks will be available on YouTube:

Local Time (CEST) - September 22nd, 2020TitlePresenterAffiliationMore info
6:40 pmNamed Entity Recognition, Concept Normalization
and Clinical Coding: Overview of the Cantemist
Track for Cancer Text Mining in Spanish, Corpus,
Guidelines, Methods and Results
Antonio Miranda-EscaladaBarcelona Supercomputing Center, SpainTBD
7:00 pmA Joint Model for Medical Named Entity Recognition and NormalizationYing XiongXili university town, ChinaTBD
7:05 pmVicomtech at CANTEMIST 2020Naiara PérezVicomtech, SpainTBD
7:10 pmExtracting Neoplasms Morphology Mentions in Spanish Clinical Cases through Word EmbeddingsPilar López-ÚbedaUniversity of Jaén, Spainvideo
7:15 pmNLNDE at CANTEMIST: Neural Sequence Labeling and Parsing Approaches for Clinical Concept ExtractionLukas LangeBosch Center for Artificial Intelligence, GermanyTBD
7:20 pmTumor Entity Recognition and Coding for Spanish Electronic Health RecordsFadi HassanUniversitat Rovira i Virgili, SpainTBD
7:25 pmICB-UMA at CANTEMIST 2020: Automatic ICD-O Coding in Spanish with BERTGuillermo López-GarcíaUniversidad de Málaga, SpainTBD
7:30 pmConclusions and wrapupAntonio Miranda-EscaladaBarcelona Supercomputing Center, Spain-


Email Martin Krallinger to:

  1. Q: What is the goal of the shared task?
    The goal is to predict the annotations (or codes) of the documents in the test and background sets.

  2. Q: How do I register?

  3. Q: How to submit the results?
    We will provide further information in the following days.
    Download the example ZIP file.
    See Submission page for more info.

  4. Q: Can I use additional training data to improve model performance?
    Yes, participants may use any additional training data they have available, as long as they describe it in the working notes. We will ask to summarize such resources in your participant paper.

  5. Q: The task consists of three sub-tasks. Do I need to complete all sub-tasks? In other words, If I only complete a sub-task or two sub-tasks, is it allowed?
    Sub-tasks are independent and participants may participate in one, two or the three of them.

  6. Q: How can I submit my results? Can I submit several prediction files for each sub-task?
    You will have to create a ZIP file with your predictions file and submit it to EasyChair (further details will be soon released).
    Yes, you can submit up to 5 prediction files, all in the same ZIP.
    Download the example ZIP file.
    See Submission page for more info.

  7. Q: Should prediction files have headings?
    No, prediction files should have no headings.

  8. Q: Are all codes and mentions equally weighted?
    Yes. However, systems will be evaluated including and excluding 8000/6 mentions.

  9. Q: What version of the eCIE-O-3-1 is in use?
    We are using the 2018 version. The table you can download from the official Spanish webpage is not complete. CIE-O allows combining the digits 6th and 7th according to the pathological study and the differentiation degree. That is, not all the combinations of the 6th and 7th characters are shown in the table.
    There is a complete list of the valid codes on our webpage. Codes not present in this list will not be used for the evaluation.

  10. Q. What is meant by the /H appended to various codes?
    Some tumor mentions contain a relevant modifier not included in the terminology for this concept. Then, we append /H to the code.
    For example, in the file cc_onco158, we have the codes 8000/1 and 8000/1/H.
    8000/1 corresponds to a mention of neoplasm (“neoplasia”, in Spanish).
    In the 8000/1/H case, the mention is (in Spanish) “neoplasia de estirpe epitelial”. The modifier “estirpe epitelial” is present in the ICD-O terminology for many tumors. However, it is not present to modify specifically the code 8000/1. Then, we consider it a relevant modifier and add the /H.


Sample Set releaseApril, 28 Sample set
Train Set Release and guidelines publicationJune, 5Dataset and annotation guidelines
Development Set ReleaseJune, 12Dataset
Test and Background Set ReleaseJuly, 3Dataset
End of evaluation period.
Predictions submission deadline
August, 5, 23:59 CEST Submission tutorial
Evaluation delivery and Test Set with Gold Standard annotationsAugust, 7Dataset
Working Notes deadlineAugust, 14, 23:59 CEST Easychair
Working Notes Corrections deadlineAugust, 25
Camera-ready submission deadlineSeptember, 1
IberLEF @ SEPLN 2020September, 22, from 16h to 20h IberLEF


Evaluation Script

  • Official evaluation script: Available on GitHub (beta version).
    This is the official evaluation script of the task.

Word embeddings

  • Spanish Medical Word Embeddings. Word embeddings generated from Spanish medical corpora. Download them from Zenodo.
    It can be used as a building block for clinical NLP systems used in Spanish texts.


Dictionary lookup based on Levenshtein distance. It looks for train and development annotations in the test set.
Results (precision, recall, f1):

  • NER: 0.181, 0.737, 0.291
  • NORM: 0.18, 0.73, 0.288
  • CODING (MAP): 0.584

Linguistic Resources

  • CUTEXT. See it on GitHub.
    Medical term extraction tool.
    It can be used to extract relevant medical terms from clinical cases.
  • SPACCC POS Tagger. See it on Zenodo.
    Part Of Speech Tagger for Spanish medical domain corpus.
    It can be used as a component of your system.
  • NegEx-MES. See it on Zenodo.
    A system for negation detection in Spanish clinical texts based on NegEx algorithm.
    It can be used as a component of your system.
  • Negation corpus. See it on GitHub
    A Corpus of Negation and Uncertainty in Spanish Clinical Texts (and instructions to train the system).
  • AbreMES-X. See it on Zenodo.
    Software used to generate the Spanish Medical Abbreviation DataBase.
  • AbreMES-DB. See it on Zenodo.
    Spanish Medical Abbreviation DataBase.
    It can be used to fine-tune your system.
  • MeSpEn Glossaries. See it on Zenodo.
    Repository of bilingual medical glossaries made by professional translators.
    It can be used to fine-tune your system.

Terminological Resources

Other Relevant Systems

  • Live demo of a NER for drug/chemical/gene in Spanish clinical texts, here.
  • NER for drug/chemical/gene with BERT on Spanish clinical texts, here.
  • Alternative NER for drug/chemical/gene on Spanish clinical documents, here.


Evaluation will be done by comparing the automatically generated results to the results generated by manual annotation of experts.

The primary evaluation metric for all three sub-tracks will consist of micro-averaged precision, recall and F1-scores:

The used evaluation scripts together with a Readme file with instructions will be available on GitHub to enable systematic fine-tuning and improvement of results on the provided training/development data using by participating teams.

For the CANTEMIST-CODING sub-track we also apply a standard ranking metric: Mean Average Precision (MAP) for evaluation purposes.

MAP (Mean Average Precision) is an established metric used for ranking problems.

All metrics will be computed including and excluding mentions with 8000/6 code.


Examples are taken from the evaluation library toy data, available on GitHub

Example 1: Evaluate the toy data output for CANTEMIST-NER

$> cd src
$> python -g ../gs-data/ -p ../toy-data/ -s ner

Clinical case name			Precision
cc_onco1.ann		0.5
cc_onco3.ann		1.0

Micro-average precision = 0.846

Clinical case name			Recall
cc_onco1.ann		0.667
cc_onco3.ann		1.0

Micro-average recall = 0.917

Clinical case name			F-score
cc_onco1.ann		0.571
cc_onco3.ann		1.0

Micro-average F-score = 0.88

Example 2: Evaluate the toy data output for CANTEMIST-NORM

$> cd src
$> python -g ../gs-data/ -p ../toy-data/ -s norm

Clinical case name			Precision
cc_onco1.ann		0.25
cc_onco3.ann		1.0

Micro-average precision = 0.769

Clinical case name			Recall
cc_onco1.ann		0.333
cc_onco3.ann		1.0

Micro-average recall = 0.833

Clinical case name			F-score
cc_onco1.ann		0.286
cc_onco3.ann		1.0

Micro-average F-score = 0.8

Example 3: Evaluate the toy data output for CANTEMIST-CODING

$> cd src
$> python -g ../gs-data/gs-coding.tsv -p ../toy-data/pred-coding.tsv -c ../valid-codes.tsv -s coding

MAP estimate: 0.75