In this section, we show examples of how the evaluation will be carried out to make it clearer.

Sub-track 1: NER offset

In this sub-track we want to match exactly the beginning and end locations of each entity tag, as well as detecting correctly the annotation type.

Gold Standard example

For the following examples, we will consider that this set of PharmaCoNER tags is our Gold Standard (GS):

Example of tags in a Gold Standard file for the NER sub-track

This GS file is in brat format. We have six NORMALIZABLES entities and three PROTEINAS entities. Each entity tag is composed of the ID, the entity type, the START and END offsets, and the TEXT snippet between the previous offsets.

NOTE: The ID and TEXT fields are not used for any of the evaluation metrics. The number in the ID field is arbitrary, and the evaluation of the TEXT field is implicit in the offset evaluation, as the text is the same for the GS and the systems.

System submission example

The following system annotations will be accepted by the evaluation script even if the IDs numbers do not match.

Example of tags in a system submission file for the NER sub-track

For this example the scores obtained by this system are the following:

Precisión has been computed dividing true positives (6) by the sum of true positives and false positives (0), scoring 6/(6+0) = 1.0. Recall has been computed dividing true positives by the sum of true positives and false negative (3: Tags with IDs T1, T8, and T9 in the GS), scoring 6/(6+3) = 0.6667. Finally, F1 is computed using precision and recall, scoring 2*((1*06667)/(1+0.6667))= 2*(0.6677/1.6667) = 0.8000.

NOTE: This is just an example. We are aware that achieving a precision score of 1.0 is quite a difficult task.

Sub-track 2: Concept Indexing

The second evaluation scenario will consist of a concept indexing task where for each document, the list of unique SNOMED concept identifiers have to be generated by participating teams, which will be compared to the manually annotated concept ids corresponding to chemical compounds and pharmacological substances.

Gold Standard example

For the following examples, we will consider that this set of PharmaCoNER tags is our Gold Standard (GS):

Example of tags in a Gold Standard file for the Concept Indexing sub-track

This GS file is in TSV format: the first column is the Document ID and it is used only to check if it matches the filename, and the second column is the Identifier of a concept found in the text. In this example, we have three different identifiers.

System submission example

For following system, only two annotations will be accepted (second and third lines) by the evaluation script. The other three annotations are not in the GS, and will be classified as False Positive:

Example of tags in a system submission file for the Concept Indexing sub-track

For this example the scores obtained by this system are the following:

Precisión has been computed dividing true positives (2) by the sum of true positives and false positives (0), scoring 2/(2+3) = 0.4. Recall has been computed dividing true positives by the sum of true positives and false negative (1: Identifier 372817009), scoring 2/(2+1) = 0.6667. Finally, F1 is computed using precision and recall, scoring 2*((0.4*06667)/(0.4+0.6667))= 2*(0.2667/1.0667) = 0.5000.