Evaluation – MESINESP2: MEDICAL SEMANTIC INDEXING IN SPANISH

For organization reasons, the evaluation systems has been modified and instead of following the traditional BioASQ method in successive cycles, the evaluation of the competition will be done against a manually annotated data set purposely created for this task.

Procedure

To submit your results for each sub-track please register at BioASQ website and follow the instructions found at submission page.

Follow the instructions from here to register at BioASQ website and CLEF Labs competition (both needed to participate in the task).

We have prepared a Youtube video to explain the process you should follow inside the BioASQ platform:

Evaluation dates

This year we have scheduled specific days to present the results of each subtrack:

Subtrack 1: 7 May 09:00 GMT – 13 May 09:00 GMT
Subtrack 2: 13 May 10:00 GMT – 17 May 10:00 GMT
Subtract 3: 17 May 11:00 GMT – 19 May 11:00 GMT

Systems evaluation

Participating teams will have to generate, for each document of the test set, the list of unique DeCS codes, which will be compared to the manually annotated DeCS codes. This list of codes must be ordered by the confidence: codes with greater confidence first. The value for confidence is not required, but the list must be ordered. This is only for a deeper analysis of the systems: we are not going to evaluate the correct order of the codes.

The structure of the JSON file to be followed by the predictions of the participating teams is similar to the one shown below:

{
  "documents": [
    {
      "id": "id_test_article_1",
      "labels": [
"code1",
"code2",
"code3"
]
    }
,
    {
      "id": "id_test_article_2",
      "labels": [
"code5",
"code2",
"code21"
]
    }
  ]
}

Please, take care about the following considerations when generating the file:

All the documents need to have at least one DeCS descriptor.
In the JSON “code1″,..”codeN” are the DeCS indicators e.g. “D005260” and not the human annotation i.e. “Femenino”.
Users must upload DeCS descriptors for every article in the test set.
The format of the JSON string is case-sensitive. Thus, trying to upload a JSON with different values (i.e “ID” instead of “id”) will result in a 500 error.
Users must upload their results before the expiration of the test
Users can upload results multiple times for the same system before the expiration of the test set. Each time that a user uploads new results the old ones are erased.

The participating systems will be assessed for their performance-based o a flat measure: the label-based micro F-measure, which is the official evaluation metric of the task.