MEDDOPLACE Subtasks

The MEDDOPLACE Shared Task is made up of four different sub-tasks, each with its own aim:

Sub-task 1: Location Entity Recognition

In this subtask, participants are challenged to automatically detect mentions of locations and location-related entities in published clinical reports in Spanish. Using the MEDDOPLACE corpus as training data, they must create systems that are able to read the text and retrieve the start and end position of the entities mentioned in the text.

Sub-task 2: Geographic Normalization

In this sub-task, participants are challenged to automatically normalize mentions of locations in published clinical reports in Spanish. Using the MEDDOPLACE normalized corpus as training data, they must create systems that are able to assign either GeoNames, PlusCodes or SNOMED-CT codes (depending on the entity type) to the mentions retrieved in Sub-task 1.

Thus, this sub-task has three different tracks:

  • Sub-task 2.1: Geocoding to GeoNames (Toponym Resolution)

For this sub-track, participants are provided a collection of named places (GPE and GEO) and their corresponding GeoNames identifier. The allCountries.zip gazetteer provided by GeoNames must be used as a reference file.

In their predictions, participants must provide a GeoNames identifier for each appropriate entity. The results will be evaluated using spatial metrics such as Area Under the Curve (AUC).

  • Sub-task 2.2: Geocoding to PlusCodes (POIs Toponym Resolution)

For this sub-track, participants are provided a collection of named places (FAC) and their corresponding long form PlusCode. The OpenLocationCode library must be used to encode and decode PlusCodes to and from coordinates.

In their predictions, participants must provide a set of coordinates which will be compared against the PlusCode’s corresponding coordinates to calculate spatial metrics such as Area Under the Curve (AUC).

  • Sub-task 2.3: Normalization to SNOMED CT (Entity Linking)

For this sub-track, participants are provided a list of entities and their corresponding SNOMED CT codes. A SNOMED CT gazetteer will be made available to be used as a reference dictionary.

In their predictions, participants must provide a single SNOMED code for each of their predictions, which will be evaluated against the Gold Standard using F-score metrics.

Sub-task 3: Entity Classification

In this sub-task, participants are challenged to classify the location entities (i.e. GPE, GEO and FAC entities) into four different classes of clinical relevance. Using the MEDDOPLACE corpus as training data, they must create systems that are able to consider the entities in context and say whether they are: (a) the patient’s origin place; (b) the patient’s residence’s location; (c) a place where the patient has travelled to or from; (d) a place where the patient has received medical attention. Only one label is possible for each annotation.

Sub-task 4: End-to-End Evaluation

In this sub-task, participant systems are evaluated in all three tasks above sequentially instead of being evaluated on their own. Using the MEDDOPLACE corpus as training data, participants must create systems (or a combination of systems) that are able to detect entities, normalize them and, finally, classify them.