Sub-tasks
The task will explore the automatic annotation projection strategies for multilingual clinical corpus creation as well as comparable multilingual concept extraction solutions, covering two sub-tracks:
Sub-task 1: MultiClinNER (Multilingual Comparable Clinical Entity Recognition)
This subtask focuses on the implementation and comparative evaluation of multilingual clinical entity recognition systems for seven different languages — English, Spanish, Dutch, Italian, Romanian, Swedish, and Czech — by comparing automatically extracted entity mentions against those manually validated by domain experts. Given a collection of clinical case reports in these seven languages, participating teams will be required to return, for each report, the corresponding character offsets of entity mentions for three different entity types: diseases, symptoms & signs and clinical procedures. Both native and automatically translated clinical case reports will be included in the test set collections. As training data, manually validated entity mentions for each language and entity type will be provided. Teams can submit results for any target language; submitting for all languages is not mandatory.
Sub-task 2: MultiClinCorpus (Multilingual Comparable Clinical Corpus Generation)
This subtask will cover the automatic generation of comparable multilingual corpora. Given a collection of plain text documents and manually annotated entity mentions for a seed language (Spanish), together with the corresponding translated versions of the texts in different target languages (English, Italian, Dutch, Swedish, Romanian and Czech), participating teams have to return the exact character offsets of all corresponding equivalent entity mentions in the target languages. As training data, manually mapped corresponding entity mentions in target languages, revised and validated by experts, will be released. Teams can submit results for any target language; submitting for all languages is not mandatory.