Metrics and Examples
MedProcNER’s evaluation will be done by comparing the automatically generated results to the results generated by manual annotation of experts. Similarly to the DisTEMIST shared task, the primary evaluation metric for all three sub-tracks will consist of micro-averaged precision, recall and F1-scores:
The evaluation library is available on GitHub to enable systematic fine-tuning and improvement of results on the provided training data using by participating teams.