Evaluation will be done by comparing the automatically generated results to the results generated by manual annotation of experts.
Evaluation metrics
The primary evaluation metric for the sub-track 1 (DisTEMIST – entities) and sub-track 2 (DISTEMIST – linking) will consist of micro-averaged precision, recall and F1-scores:
![](http://temu.bsc.es/cantemist/wp-content/uploads/2020/01/cantemist_figure3.png)
The used evaluation scripts together with a Readme file with instructions are available on GitHub to enable systematic fine-tuning and improvement of results on the provided training data using by participating teams.