ClinSpEn

NEWS (August 2022): Submissions via CodaLab now available! More information and submission instructions on the Evaluation tab.

This website contains the data for the ClinSpEn subtracks, focused on clinical EN-ES machine translation and part of the biomedical task of WMT 2022.

Motivation

Machine translation applied to the clinical domain is a specially challenging task due to the complexity of medical language and the heavy use of health-related technical terms and medical expressions. Therefore there is a large community of specialized medical translators, able to deal with medical narratives, terminologies or the use of ambiguous abbreviations and acronyms. 

Taking into account the relevance, impact and diversity of health-related content, as well as the rapidly growing number of publications, EHRs, clinical trials,  informed consent documents and medical terminologies there is a pressing need to be able to generate more robust medical machine translation resources together with independent quality evaluation scenarios.  

Recent advances in machine translation technologies together with the use of other NLP components are showing promising results, thus domain adaptation of MT approaches can have a significant impact in unlocking key information from medical content.

Therefore, the ClinSpEn data represents three different types of data very relevant to the biomedical domain: clinical cases, clinical terminology and ontology concepts.

Sub-tracks

All in all, ClinSpEn is comprised of three different sub-tracks:

  • ClinSpEn-CC (clinical cases): EN>ES translation of clinical cases using a collection of 202 parallel COVID-19 clinical case reports.
  • ClinSpEn-CT (clinical terms): ES>EN translation of clinical terminology using a collection of over 19 000 parallel terms obtained from biomedical literature and electronic health records.
  • ClinSpEn-OC (ontology concepts): EN>ES translation of a collection of over 2 000 parallel concepts obtained from different biomedical ontologies.

All documents and terms in the ClinSpEn collection have been manually translated and revised by professional medical translators in order to ensure the quality and validity of the data.

Schedule

EventDate (all deadlines are 23:59 CEST)Link
Release of ClinSpEn Clinical Cases sample data27/04/2022Zenodo
Release of ClinSpEn Clinical Terms sample data27/04/2022Zenodo
Release of ClinSpEn Ontology Concepts sample data27/04/2022Zenodo
Release of ClinSpEn Clinical Cases test data21/07/2022Zenodo
Release of ClinSpEn Clinical Terms test data21/07/2022Zenodo
Release of ClinSpEn Ontology Concepts test data21/07/2022Zenodo
Predictions due01/09/2022 (EXTENDED)Codalab
Paper submission deadline 07/09/2022
Paper notification 09/10/2022
Camera-ready version 16/10/2022
Conference - EMNLP 202207/12/2022-08/12/2022