Background and Motivation

Clinical content in the form of diverse formats such as patient medical records, detailed clinical case reports, or electronic health records (EHR), continues to grow at an unprecedented rate across healthcare systems worldwide, existing in numerous languages beyond English, reflecting the global nature of medical practice. The extensive length of many clinical reports presents a significant barrier for healthcare professionals who must efficiently extract essential clinical information from these documents. Recent advances in generative and encoder-based large language models (LLMs) have demonstrated substantial potential for automated summarization—offering the capability to distill comprehensive clinical narratives into concise summaries retaining critical diagnostic, relevant medical concepts and clinical details while significantly reducing document length. This creates an urgent imperative to rigorously assess and compare the performance of clinical summarization systems across multiple languages.

We present MultiClinSum-2, the second edition of our shared task focusing on automatic summarization of lengthy clinical case reports across 15 languages: English, French, Spanish, Portuguese, Italian, Russian, Catalan, Norwegian, Danish, Romanian, German, Greek, Dutch, Czech, and Swedish. The task leverages a comprehensive corpus of full clinical cases paired with reference summaries derived from biomedical literature. Automatically generated summaries will be evaluated against gold standard summaries using ROUGE-2 scores and BERTScore, complemented by LLM-as-a-judge approaches for qualitative assessment.