Evaluation – CANTEMIST

Submission

Submission instructions

5 submissions per sub-track will be allowed.

You must submit ONE SINGLE ZIP file with the following structure:

One subdirectory per subtask in which you are participating.
In addition, in the parent directory, you must add a README.txt file with your contact details and a really short explanation of your system.
If you have more than one system, you can include their predictions and we will evaluate them (up to 5 prediction runs).
Cantemist-NER and Cantemist-NORM:
- You must include the Brat annotation files (.ANN) with your predictions.
- One annotation file per document.
- If you have more than one system, create sub-directories inside the cantemist-ner and cantemist-norm directories. One subdirectory per system.
- If you have more than one system, name the subdirectories with numbers and a recognizable name. For example, 1-systemDL and 2-systemlookup
Cantemist-CODING:
- You must include the tab-separated file with your predictions.
- One single file with all the predictions.
- With a .tsv file extension.
- If you have more than one system, include one tab-separated file for each system.
- If you have more than one system, name the tab-separated files with numbers and a more or less recognizable name. For example, 1-systemDL.tsv and 2-systemlookup.tsv

Download here a submission ZIP example.

Submission method

Submissions will be made via SFTP.

Download here the submission tutorial.

Submission format

CANTEMIST-NER

Brat Format: one ANN file per document. ANN files have the following format:

Figure 1. Example of submission file for CANTEMIST-NER

CANTEMIST-NORM

Brat Format: one ANN file per document. ANN files have the following format (with the codes added as Brat comments):

Figure 2. Example of submission file for CANTEMIST-NORM

CANTEMIST-CODING

A tab-separated with two columns: clinical case and code. Codes must be ordered by rank/confidence, with more relevant codes first. For example:

Figure 3. Example of submission file for CANTEMIST-CODING

Examples

Examples are taken from the evaluation library toy data, available on GitHub

Example 1: Evaluate the toy data output for CANTEMIST-NER

$> cd src
$> python main.py -g ../gs-data/ -p ../toy-data/ -s ner

-----------------------------------------------------
Clinical case name			Precision
-----------------------------------------------------
cc_onco1.ann		0.5
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average precision = 0.846


-----------------------------------------------------
Clinical case name			Recall
-----------------------------------------------------
cc_onco1.ann		0.667
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average recall = 0.917


-----------------------------------------------------
Clinical case name			F-score
-----------------------------------------------------
cc_onco1.ann		0.571
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average F-score = 0.88

Example 2: Evaluate the toy data output for CANTEMIST-NORM

$> cd src
$> python main.py -g ../gs-data/ -p ../toy-data/ -s norm

-----------------------------------------------------
Clinical case name			Precision
-----------------------------------------------------
cc_onco1.ann		0.25
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average precision = 0.769


-----------------------------------------------------
Clinical case name			Recall
-----------------------------------------------------
cc_onco1.ann		0.333
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average recall = 0.833


-----------------------------------------------------
Clinical case name			F-score
-----------------------------------------------------
cc_onco1.ann		0.286
-----------------------------------------------------
cc_onco3.ann		1.0
-----------------------------------------------------

Micro-average F-score = 0.8

Example 3: Evaluate the toy data output for CANTEMIST-CODING

$> cd src
$> python main.py -g ../gs-data/gs-coding.tsv -p ../toy-data/pred-coding.tsv -c ../valid-codes.tsv -s coding

MAP estimate: 0.75

Evaluation Library

The Cantemist evaluation script is available on GitHub (beta version).
Please, make sure you have the latest version.

These scripts are distributed as part of the CANcer TExt Mining Shared Task (Cantemist). They are written in Python3 and intended to be run via command line:

$> python3 main.py -g ../gs-data/ -p ../toy-data/ -s norm
$> python3 main.py -g ../gs-data/ -p ../toy-data/ -s ner 
$> python3 main.py -g ../gs-data/gs-coding.tsv -p ../toy-data/pred-coding.tsv -c ../valid-codes.tsv -s coding

They produce the evaluation metrics for the corresponding sub-tracks: precision, recall and F-score for Cantemist-NORM and Cantemist-NER; and Mean Average Precision for the sub-track Catemist-CODING.