Participants’ predictions are compared against the manual Gold Standard (generated by manual annotations of experts).
The primary evaluation metric for the LivingNER-Species NER and LivingNER-Species Norm sub-tracks consists of micro-averaged precision, recall, and F1 scores:
The used evaluation scripts together with proper documentation are freely available on GitHub to enable evaluation tools source code local testing by participating teams. Evaluation scripts.
More information on LivingNER – Clinical Impact track TBD