Submission instructions

5 submissions per sub-track will be allowed.

You must submit ONE SINGLE ZIP file with the following structure:

  • One subdirectory per subtask in which you are participating. 
  • In addition, in the parent directory, you must add a README.txt file with your contact details and a really short explanation of your system.
  • If you have more than one system, you can include their predictions and we will evaluate them (up to 5 prediction runs).
  • Cantemist-NER and Cantemist-NORM: 
    • You must include the Brat annotation files (.ANN) with your predictions. 
    • One annotation file per document. 
    • If you have more than one system, create sub-directories inside the cantemist-ner and cantemist-norm directories. One subdirectory per system. 
    • If you have more than one system, name the subdirectories with numbers and a recognizable name. For example, 1-systemDL and 2-systemlookup
  • Cantemist-CODING:
    • You must include the tab-separated file with your predictions. 
    • One single file with all the predictions.
    • With a .tsv file extension.  
    • If you have more than one system, include one tab-separated file for each system.
    • If you have more than one system, name the tab-separated files with numbers and a more or less recognizable name. For example, 1-systemDL.tsv and 2-systemlookup.tsv

Download here a submission ZIP example.

Submission method

Submissions will be made via SFTP.

Download here the submission tutorial.

Submission format


Brat Format: one ANN file per document. ANN files have the following format:

Figure 1. Example of submission file for CANTEMIST-NER

Brat Format: one ANN file per document. ANN files have the following format (with the codes added as Brat comments):

Figure 2. Example of submission file for CANTEMIST-NORM

A tab-separated with two columns: clinical case and code. Codes must be ordered by rank/confidence, with more relevant codes first. For example:

Figure 3. Example of submission file for CANTEMIST-CODING

Annotation Guidelines

Annotation guidelines can be downloaded from Zenodo.

Cantemist train development, test and background sets are already available at Zenodo

The Cantemist corpus was manually annotated by clinical experts following the Cantemist guidelines. These guidelines contain rules for annotating morphology neoplasms in Spanish oncology clinical cases; as well as for mapping these annotations to CIEO-3 (Spanish version of ICD-O-3).

Guidelines were created de novo by clinical experts in three phases:

  1. First, a zero version of guidelines after the clinical experts reviewed neoplasm morphology annotations in SPACCC corpus (Codiesp guidelines, for tumor morphology).
  2. Second, a stable version of guidelines was reached while annotating sample sets of Cantemist corpus iteratively until quality control was satisfactory.
  3. Third, guidelines are iteratively refined as manual annotation continues.

Post-annotation review steps:

  • Consistency review: occurrences of all annotations were looked up in all documents and a clinical expert reviewed whether they should be added to the annotations.
  • CIEO-3 Code length review: all codes were checked to have 4 or more characters and 7 or fewer characters (8140/32 CIEO-3 code has 7 characters).
  • Trailing newline: newline characters (\n) are removed from annotations.
  • Internal newline check: annotations with newline characters within them are removed since they span more than one line.
  • Starting and ending annotation characters: check that all annotations start and end with an alphanumeric character or a parenthesis. For example (,adenocarcinoma, would be a wrong annotation since it is surrounded by commas).