Evaluation – LivingNER Shared Task

Evaluation

Participants’ predictions are compared against the manual Gold Standard (generated by manual annotations of experts).

The primary evaluation metric for the LivingNER-Species NER and LivingNER-Species Norm sub-tracks consists of micro-averaged precision, recall, and F1 scores:

The used evaluation scripts together with proper documentation are freely available on GitHub to enable evaluation tools source code local testing by participating teams. Evaluation scripts.

More information on LivingNER – Clinical Impact track TBD