MultiCardioNER Subtasks

The MultiCardioNER Shared Task is made up of two different subtasks, each with its own aim:

Subtask 1 (CardioDis): Spanish adaptation of disease recognition systems to the cardiology domain

In this subtask, participants are challenged to create systems for disease recognition in Spanish text using the entire DisteMIST corpus as training data. They are then provided a development set made up of annotated cardiology clinical case reports to further fine-tune their systems and evaluated in a test set of cardiology reports with similar characteristics. In short, participants must create systems that are able to read the text and retrieve the start and end position of diseases mentioned in the text.

Subtask 2 (MultiDrug): Multilingual (Spanish, English and Italian) adaptation of medication recognition systems to the cardiology domain

In this subtask, participants are challenged to create systems for medication recognition in three languages: Spanish, English and Italian. They are provided the newly-released DrugTEMIST corpus in the three languages to use as training data.

Again, they are then provided a development set made up of annotated cardiology clinical case reports to further fine-tune their systems. The evaluation is done independently for each language using a test set of cardiology reports with similar characteristics. In short, participants must create systems that are able to read the text and retrieve the start and end position of medications mentioned in the text. A bigger collection of multilingual documents will also be provided to be used as background set.

The multilingual datasets provided were originally created in Spanish and then translated into English and Italian using annotation projection strategies. The annotation of all datasets was revised by clinical experts who are native speakers of each of the three languages.

Note that teams can submit their prediction for any of the three languages. We encourage teams to work on the task as they like, be it creating a monolingual model for each language, creating a truly multilingual model that can tackle all of them at once or even a completely different approach.