Evaluation

The evaluation will be done via CodaLab. Participants will be able to upload their predictions and obtain the scores automatically. Five metrics are used: METEOR, COMET, SacreBLEU, BLEU and ROUGE, with the main metric being SacreBLEU.

The leaderboard will be shared on the date of the EMNLP 2022 conference.

In case you are having problems or questions about the submission process, we have prepared a submission guide that you can access here: “A step-by-step guide to submitting your ClinSpEn predictions”.

In addition to the manually translated test set by professional medical translators, participants will also have access to a larger background collection for each of the three subtracks, which might serve as additional resources and to promote scalability and robustness assessment of machine translation technology.

After the evaluation period has passed, CodaLab submissions will remain open so that anyone can benchmark their systems. However, these submissions will not be part of the WMT workshop and won’t appear in the overview paper.

Special thanks for sharing some of the evaluation scripts to the MedMTEval organizers, a competition focused on the automatic translation of medical texts from Russian to English and part of the AINL 2022 conference. For more information, please check their article “E. Ezhergina, M. Fedorova, V. Malykh, and D. Petrova. Findings of Biomedical Russian-English MT Competition. To appear in AINL 2022 Proceedings.” We also want to thank Tom Kocmi, one of the developers of the OCELOT evaluation tool, for his support.