The CodiEsp evaluation script can be downloaded from GitHub.

Please, make sure you have the latest version.


Example 1: CodiEsp-D or CodiEsp-P
Evaluate the system output pred_D.tsv against the gold standard gs_D.tsv (both inside toy_data subfolders).

$>  python3 codiespD_P_evaluation.py -g gold/toy_data/gs_D.tsv -p system/toy_data/pred_D.tsv -c codiesp_codes/codiesp-D_codes.tsv

MAP estimate: 0.444

Example 2: CodiEsp-X
Evaluate the system output pred_X.tsv against the gold standard gs_X.tsv (both inside toy_data subfolders).

$> python3 codiespX_evaluation.py -g gold/toy_data/gs_X.tsv -p system/toy_data/pred_X.tsv -cD codiesp_codes/codiesp-D_codes.tsv -cP codiesp_codes/codiesp-P_codes.tsv 

-----------------------------------------------------
Clinical case name			Precision
-----------------------------------------------------
S0000-000S0000000000000-00		nan
-----------------------------------------------------
S1889-836X2016000100006-1		0.625
-----------------------------------------------------
codiespX_evaluation.py:248: UserWarning: Some documents do not have predicted codes, document-wise Precision not computed for them.

Micro-average precision = 0.556


-----------------------------------------------------
Clinical case name			Recall
-----------------------------------------------------
S0000-000S0000000000000-00		nan
-----------------------------------------------------
S1889-836X2016000100006-1		0.455
-----------------------------------------------------
codiespX_evaluation.py:260: UserWarning: Some documents do not have Gold Standard codes, document-wise Recall not computed for them.

Micro-average recall = 0.385


-----------------------------------------------------
Clinical case name			F-score
-----------------------------------------------------
S0000-000S0000000000000-00		nan
-----------------------------------------------------
S1889-836X2016000100006-1		0.526
-----------------------------------------------------
codiespX_evaluation.py:271: UserWarning: Some documents do not have predicted codes, document-wise F-score not computed for them.
codiespX_evaluation.py:274: UserWarning: Some documents do not have Gold Standard codes, document-wise F-score not computed for them.

Micro-average F-score = 0.455


__________________________________________________________

MICRO-AVERAGE STATISTICS:

Micro-average precision = 0.556

Micro-average recall = 0.385

Micro-average F-score = 0.455

Contact for technical issues

Antonio Miranda-Escalada (antonio.miranda@bsc.es)