Datasets

ClinSpEn includes three datasets, each originally used for a different sub-track in WMT 22:

  • ClinSpEn-CC (clinical cases): EN>ES translation of clinical cases using a collection of 202 parallel COVID-19 clinical case reports.
  • ClinSpEn-CT (clinical terms): ES>EN translation of clinical terminology using a collection of over 19 000 parallel terms obtained from biomedical literature and electronic health records.
  • ClinSpEn-OC (ontology concepts): EN>ES translation of a collection of over 2 000 parallel concepts obtained from different biomedical ontologies.