Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

Protected: Results

This content is password protected. To view it please enter your password below:


Guide to submit your papers: https://temu.bsc.es/cantemist/wp-content/uploads/2020/08/Step-by-step-guide-to-submit-your-working-notes-Cantemist.pdf

Following the setting of successful previous shared tasks we had organized in the past (e.g. MEDDOCAN/IberLef2020, PharmaCoNER/BioNLP-ST EMNLP or CHEMDNER/BioCreative) we will invite all teams sending a test set prediction submission to send a workshop proceedings paper on their system (systems description paper) to be published in the SEPLN/IberLEF workshop CEUR proceedings.

The proceedings of the previous IberLEF2019 are online at http://ceur-ws.org/Vol-2421/

Working notes format

The working notes style is available via our proceedings volume template at
http://ceur-ws.org/Vol-XXX/ (we will use single-column format as in previous years).

Overleaf users can clone the style from

Offline versions for LaTeX and DOCX are available from”

Additionally we plan to prepare a journal special issue on the CANTEMIST task overview, corpus and results together with participating technical team systems descriptions in a Q1 journal.


Cantemist (CANcer TExt Mining Shared Task) will be part of the IberLEF (Iberian Languages Evaluation Forum) 2020 evaluation campaign at the SEPLN 2020

36th Annual SEPLN Congress (September 23rd to 25th 2020, Málaga)

IberLEF aims to foster the research community to define new challenges and obtain cutting-edge results for the Natural Language Processing community, involving at least one of the Iberian languages: Spanish, Portugueses, Catalan, Basque or Galician. Accordingly, several shared-tasks challenges are proposed.

IberLEF 2020: https://sites.google.com/view/iberlef2020/

SEPLN 2020: http://sepln2020.sepln.org/index.php/en/iberlef-en/

Register for IberLEF and SEPLN here: http://sepln2020.sepln.org/index.php/registro/

All Cantemist talks will be available on YouTube: https://www.youtube.com/playlist?list=PL5uSCzf1azhC24g5dsp5eVMp8BZFWCraX.

Local Time (CEST) - September 22nd, 2020TitlePresenterAffiliationMore info
6:40 pmNamed Entity Recognition, Concept Normalization
and Clinical Coding: Overview of the Cantemist
Track for Cancer Text Mining in Spanish, Corpus,
Guidelines, Methods and Results
Antonio Miranda-EscaladaBarcelona Supercomputing Center, SpainTBD
7:00 pmA Joint Model for Medical Named Entity Recognition and NormalizationYing XiongXili university town, ChinaTBD
7:05 pmVicomtech at CANTEMIST 2020Naiara PérezVicomtech, SpainTBD
7:10 pmExtracting Neoplasms Morphology Mentions in Spanish Clinical Cases through Word EmbeddingsPilar López-ÚbedaUniversity of Jaén, Spainvideo
7:15 pmNLNDE at CANTEMIST: Neural Sequence Labeling and Parsing Approaches for Clinical Concept ExtractionLukas LangeBosch Center for Artificial Intelligence, GermanyTBD
7:20 pmTumor Entity Recognition and Coding for Spanish Electronic Health RecordsFadi HassanUniversitat Rovira i Virgili, SpainTBD
7:25 pmICB-UMA at CANTEMIST 2020: Automatic ICD-O Coding in Spanish with BERTGuillermo López-GarcíaUniversidad de Málaga, SpainTBD
7:30 pmConclusions and wrapupAntonio Miranda-EscaladaBarcelona Supercomputing Center, Spain-


Email Martin Krallinger to: encargo-pln-life@bsc.es

  1. Q: What is the goal of the shared task?
    The goal is to predict the annotations (or codes) of the documents in the test and background sets.

  2. Q: How do I register?
    Here: https://temu.bsc.es/cantemist/?p=3956

  3. Q: How to submit the results?
    We will provide further information in the following days.
    Download the example ZIP file.
    See Submission page for more info.

  4. Q: Can I use additional training data to improve model performance?
    Yes, participants may use any additional training data they have available, as long as they describe it in the working notes. We will ask to summarize such resources in your participant paper.

  5. Q: The task consists of three sub-tasks. Do I need to complete all sub-tasks? In other words, If I only complete a sub-task or two sub-tasks, is it allowed?
    Sub-tasks are independent and participants may participate in one, two or the three of them.

  6. Q: How can I submit my results? Can I submit several prediction files for each sub-task?
    You will have to create a ZIP file with your predictions file and submit it to EasyChair (further details will be soon released).
    Yes, you can submit up to 5 prediction files, all in the same ZIP.
    Download the example ZIP file.
    See Submission page for more info.

  7. Q: Should prediction files have headings?
    No, prediction files should have no headings.

  8. Q: Are all codes and mentions equally weighted?
    Yes. However, systems will be evaluated including and excluding 8000/6 mentions.

  9. Q: What version of the eCIE-O-3-1 is in use?
    We are using the 2018 version. The table you can download from the official Spanish webpage is not complete. CIE-O allows combining the digits 6th and 7th according to the pathological study and the differentiation degree. That is, not all the combinations of the 6th and 7th characters are shown in the table.
    There is a complete list of the valid codes on our webpage. Codes not present in this list will not be used for the evaluation.

  10. Q. What is meant by the /H appended to various codes?
    Some tumor mentions contain a relevant modifier not included in the terminology for this concept. Then, we append /H to the code.
    For example, in the file cc_onco158, we have the codes 8000/1 and 8000/1/H.
    8000/1 corresponds to a mention of neoplasm (“neoplasia”, in Spanish).
    In the 8000/1/H case, the mention is (in Spanish) “neoplasia de estirpe epitelial”. The modifier “estirpe epitelial” is present in the ICD-O terminology for many tumors. However, it is not present to modify specifically the code 8000/1. Then, we consider it a relevant modifier and add the /H.


Sample Set releaseApril, 28 Sample set
Train Set Release and guidelines publicationJune, 5Dataset and annotation guidelines
Development Set ReleaseJune, 12Dataset
Test and Background Set ReleaseJuly, 3Dataset
End of evaluation period.
Predictions submission deadline
August, 5, 23:59 CEST Submission tutorial
Evaluation delivery and Test Set with Gold Standard annotationsAugust, 7Dataset
Working Notes deadlineAugust, 14, 23:59 CEST Easychair
Working Notes Corrections deadlineAugust, 25
Camera-ready submission deadlineSeptember, 1
IberLEF @ SEPLN 2020September, 22, from 16h to 20h IberLEF


Evaluation will be done by comparing the automatically generated results to the results generated by manual annotation of experts.

The primary evaluation metric for all three sub-tracks will consist of micro-averaged precision, recall and F1-scores:

The used evaluation scripts together with a Readme file with instructions will be available on GitHub to enable systematic fine-tuning and improvement of results on the provided training/development data using by participating teams.

For the CANTEMIST-CODING sub-track we also apply a standard ranking metric: Mean Average Precision (MAP) for evaluation purposes.

MAP (Mean Average Precision) is an established metric used for ranking problems.

All metrics will be computed including and excluding mentions with 8000/6 code.


To register, please fill in the registration form:


CANTEMIST: CANcer TExt Mining Shared Task will be part of the IberLEF 2020 evaluation campaign at the SEPLN 2020

The Plan for Promoting Language Technologies (Plan TL) aims to promote the development of natural language processing, machine translation and conversational systems in Spanish. In the line, through its collaboration with the Barcelona Supercomputing Center (BSC) to promote activities for specialized on language technologies applied to health an biomedicine , we announce the call for shared task awards detailed below.

Registration: Fill in an online registration form.

Deadlines for submission:  August, 3

Evaluation: The evaluation of the automatic predictions for this task will have three different scenarios or sub-tasks:

  1. CANTEMIST-NER. Main evaluation metric: F-score.
  2. CANTEMIST-NORM. Main evaluation metric: F-score.
  3. CANTEMIST-CODING. Main evaluation metric: Mean Average Precision.

For further details on the evaluation of the sub-tasks, please refer to Evaluation.

Task organizers: This task has been coordianted by the OT de Sanidad of the Plan TL.

Scientific committee evaluator: Please refer to Scientific Committee.

Selection of winners: The first 3 classified in the tasks will be selected as finalists to receive prizes. System evaluations will be performed according to the evaluation criteria described in Evaluation.

Budget: The total budget for this call is 5100 euros.

The first classified in each of the sub-task will receive a prize of 1,000 euros, the second classified in each of the sub-task will receive a prize of 500 euros and the third classified in each of the sub-task will receive a prize of 200 euros.

Contact: For further details, please refer to encargo-pln-life@bsc.es

Official Cantemist results

Best run per participant team