Evaluation – MEDDOPROF

MEDDOPROF Shared Task’s sub-tracks will be evaluated in the following way:

Track A – MEDDOPROF-NER

Submissions will be ranked by Precision, Recall and F1-score for each PROFESION [profession] or SITUACION_LABORAL [working status] mention extracted, where the spans overlap entirely (F-score is the primary metric).

A correct prediction must have the same beginning and ending offsets as the Gold Standard annotation, as well as the same label (PROFESION or SITUACION_LABORAL)

Prediction format: brat annotation files (.ANN) with your predictions.

Track B – MEDDOPROF-CLASS

Submissions will be ranked by Precision, Recall and F1-score for each PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or OTROS [others] mention extracted, where the spans overlap entirely (F-score is the primary metric).

A correct prediction must have the same beginning and ending offsets as the Gold Standard annotation, as well as the same label.

Prediction format: brat annotation files (.ANN) with your predictions.

Track C – MEDDOPROF-NORM

For this track, participants will be provided with a list of unique concept identifiers from the European Skills, Competences, Qualifications and Occupations (ESCO) classification and relevant SNOMED-CT terms. Participants will have to detect PROFESION and SITUACION_LABORAL mentions and map each of them to one of the terms in the list. Then, their mappings will be compared to the manually annotated concept ids and evaluated using F1-score.

Precision, Recall and F1-score will be calculated using the following formula:

Precision (P) = true positives/(true positives + false positives)
Recall (R) = true positives/(true positives + false negatives)
F-score (F1) = 2*((P*R)/(P+R))

Evaluation Library

Evaluation Method

Track A – MEDDOPROF-NER

Track B – MEDDOPROF-CLASS

Track C – MEDDOPROF-NORM