{"id":46,"date":"2025-11-07T11:12:30","date_gmt":"2025-11-07T10:12:30","guid":{"rendered":"https:\/\/temu.bsc.es\/MultiClinAI\/?page_id=46"},"modified":"2026-04-08T11:40:19","modified_gmt":"2026-04-08T09:40:19","slug":"home","status":"publish","type":"page","link":"https:\/\/temu.bsc.es\/MultiClinAI\/","title":{"rendered":"Home"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\"><strong>MultiClinAI Shared Task Homepage<\/strong><\/h1>\n\n\n<p><meta name=\"google-site-verification\" content=\"iOe2W15flMaL1ig4AeCUV2pd6UyCIBsGkDLLobM-rt8\" \/><\/p>\n\n\n<p><em><em>The MultiClinAI Track <em>is organized by the Barcelona Supercomputing Center\u2019s NLP for Biomedical Information Analysis group<\/em> <em>and promoted<\/em> by European projects such as DataTools4Heart and AI4HF.<\/em><\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is MultiClinAI?<\/h2>\n\n\n\n<p><strong>MultiClinAI<\/strong> is a shared task focused on the creation of comparable multilingual corpora via annotation projection, as well as the multilingual extraction of clinical concepts.<\/p>\n\n\n\n<p>For more information about the <strong>MultiClinAI<\/strong> task, check the <a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/task-info\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Task Info<\/strong><\/a> tab, which includes the <a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/motivation\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Motivation<\/strong><\/a>, <strong><a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/subtasks\/\" target=\"_blank\" rel=\"noreferrer noopener\">Subtasks<\/a><\/strong>, <strong><a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/schedule\/\" target=\"_blank\" rel=\"noreferrer noopener\">Schedule<\/a><\/strong> and <strong><a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/registration\/\" target=\"_blank\" rel=\"noreferrer noopener\">Registration<\/a><\/strong>, as well as the <strong>Evaluation &amp; Submission<\/strong> tab.<\/p>\n\n\n\n<p>To learn more about the MultiClinAI corpora and how they were annotated, check the <strong><a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/data\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data<\/a><\/strong> tab.<\/p>\n\n\n\n<p>MultiClinAI will be held as part of the <strong>#SMM4H-HeaRD Workshop<\/strong> in the <strong>ACL 2026<\/strong> conference. For more information about them, check the <a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/workshop\/\" data-type=\"link\" data-id=\"https:\/\/temu.bsc.es\/multicardioner\/workshop\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Workshop<\/strong><\/a> tab.<\/p>\n\n\n\n<p>MultiClinAI is organized by the Barcelona Supercomputing Center&#8217;s NLP for Biomedical Information Analysis group (formerly Text Mining Unit).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\" style=\"width:100%; margin:0;\">\n  <img decoding=\"async\" src=\"https:\/\/temu.bsc.es\/MultiClinAI\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-13-at-16.18.59.png\" \n       alt=\"\" \n       class=\"wp-image-167\" \n       style=\"width:100%; height:auto; display:block;\" \/>\n<\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Motivation<\/h2>\n\n\n\n<p>Named entity recognition (NER) systems are fundamental for clinical natural language processing (NLP), enabling the identification of key clinical concepts\u2014such as diseases, symptoms, medications, and procedures\u2014in medical documents and electronic health records (EHRs). These systems support clinical workflows and decision-making, as well as large-scale health data analysis. However, their development relies on high-quality expert-annotated corpora, which are costly, time-consuming to produce, and typically language-specific. This poses significant challenges in multilingual settings, particularly for low-resource languages (LRLs).<\/p>\n\n\n\n<p>Multinational clinical studies, rare disease research, and multicentric trials require comparable annotation criteria and interoperable information extraction systems across languages. Yet, multilingual clinical corpora annotated under consistent guidelines remain scarce. Recent advances in machine translation, large language models (LLMs), and generative AI provide new opportunities to translate annotated datasets, project annotations across languages, and create comparable multilingual corpora through annotation projection and entity alignment strategies.<\/p>\n\n\n\n<p>In this context, the <strong>MultiClinAI<\/strong> (Multilingual Clinical Entity Annotation Projection and Extraction) shared task addresses the creation and evaluation of comparable multilingual clinical resources across seven languages (<strong>Czech<\/strong>, <strong>English<\/strong>, <strong>Spanish<\/strong>, <strong>Dutch<\/strong>, <strong>Italian<\/strong>, <strong>Romanian <\/strong>and <strong>Swedish<\/strong>), focusing on three key entity types: diseases, symptoms, and procedures.<\/p>\n\n\n\n<p>By jointly evaluating multilingual extraction performance and annotation projection strategies across seven languages, <strong>MultiClinAI<\/strong> establishes a robust benchmarking scenario for multilingual clinical NLP. The shared task encourages the development of generalizable, transferable, and scalable approaches capable of supporting cross-lingual healthcare applications.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Task Overview<\/h2>\n\n\n\n<p><strong>MultiClinAI<\/strong> is divided into two independent but complementary subtasks, as described below:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MultiClinNER<\/strong>: This challenge focuses on multilingual clinical named entity recognition across seven languages (<strong>Czech<\/strong>, <strong>English<\/strong>, <strong>Spanish<\/strong>, <strong>Dutch<\/strong>, <strong>Italian<\/strong>, <strong>Romanian <\/strong>and <strong>Swedish<\/strong>). Participants must identify and classify mentions of DISEASE, SYMPTOM, and PROCEDURE entities by predicting their exact spans and types. The task follows a standard entity-level evaluation framework and provides a unified benchmark for comparing monolingual, multilingual, and cross-lingual extraction approaches.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MultiClinCorpus:<\/strong> This challenge focuses on the automatic construction of comparable multilingual clinical corpora from a <strong>Spanish gold-standard<\/strong> dataset into six target languages (<strong>Czech<\/strong>, <strong>English<\/strong>, <strong>Dutch<\/strong>, <strong>Italian<\/strong>, <strong>Romanian <\/strong>and <strong>Swedish<\/strong>). Participants must automatically generate comparable annotated corpora through cross-lingual transfer methods. The task evaluates how effectively systems can project and align annotations across languages to produce consistent multilingual clinical resources.<\/li>\n<\/ul>\n\n\n\n<p>Both subtasks are evaluated using standard classification metrics, including <strong>precision<\/strong>, <strong>recall<\/strong>, and <strong>F1-score<\/strong>. An official evaluation script will be provided to ensure transparency and comparability of results.<\/p>\n\n\n\n<p>Participation in MultiClinAI is <strong>flexible<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Teams may participate in <strong>one or both subtasks<\/strong>.<\/li>\n\n\n\n<li>Teams may submit results for <strong>one, several, or all languages<\/strong>.<\/li>\n\n\n\n<li>Covering all languages is <strong>not mandatory<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Data<\/h2>\n\n\n\n<p>The <strong>training data<\/strong> for the subtasks of <strong>MultiClinAI<\/strong> is composed of several well-established clinical corpora, namely DisTEMIST, SympTEMIST, MedProcNER, and the extended version of the CardioCCC corpus. It encompasses three clinical entity types: <strong>diseases<\/strong> (DISEASE), <strong>symptoms &amp; signs<\/strong> (SYMPTOM) and <strong>clinical procedures<\/strong> (PROCEDURE), and is available in seven languages: <strong>Spanish<\/strong> (es), <strong>Czech<\/strong> (cz), <strong>Dutch<\/strong> (nl), <strong>English<\/strong> (en), <strong>Italian<\/strong> (it), <strong>Romanian<\/strong> (ro), and <strong>Swedish<\/strong> (sv).<\/p>\n\n\n\n<p>Both subtasks rely on the same underlying textual resources. However, the problem definition, modeling objectives, and evaluation procedures differ between <strong>MultiClinNER<\/strong> (Multilingual Comparable Clinical Entity Recognition) and <strong>MultiClinCorpus<\/strong> (Multilingual Comparable Clinical Corpus Generation).<\/p>\n\n\n\n<p>The training data for each subtask follows a standardised folder structure for each language and entity type, ensuring consistency across tasks and facilitating system development, which is shown below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>MultiClinAI-training_data_v1.1-260225\/<\/strong>\n \u251c\u2500\u2500 <strong>MultiClinNER\/<\/strong>\n \u2502    \u251c\u2500\u2500 MultiClinNER-es\/\n \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train\/\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-disease\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ann\/\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-disease-0001.ann\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-disease-0002.ann\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 txt\/\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-disease-0001.txt\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-disease-0002.txt\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-symptom\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-es-train-procedure\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u251c\u2500\u2500 MultiClinNER-cz\/\n \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-cz-train\/\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-cz-train-disease\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ann\/\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 txt\/\n \u2502    \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-cz-train-symptom\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u2502    \u2502    \u251c\u2500\u2500 MultiClinNER-cz-train-procedure\/\n \u2502    \u2502    \u2502    \u2502    \u251c\u2500\u2500 ...\n \u2502    \u251c\u2500\u2500 MultiClinNER-{nl,en,it,ro,sv}\/ (same as es and cz)\n \u251c\u2500\u2500 <strong>MultiClinCorpus\/<\/strong> (same as MultiClinNER folder)<\/code><\/pre>\n\n\n\n<p>Data access is granted upon registration and, where applicable, agreement to the data usage terms. For additional details, please consult the <a href=\"https:\/\/temu.bsc.es\/MultiClinAI\/data\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data<\/a> tab.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Registration<\/h2>\n\n\n\n<p><a href=\"https:\/\/forms.gle\/oE9gfaNxFw2f6gyX6\">https:\/\/forms.gle\/oE9gfaNxFw2f6gyX6<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Schedule<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Event<\/th><th>Date (Midnight CET)<\/th><\/tr><\/thead><tbody><tr><td><a href=\"https:\/\/doi.org\/10.5281\/zenodo.18508037\">MultiClinNER subtask training set release<\/a><\/td><td>February 6, 2026<\/td><\/tr><tr><td><a href=\"https:\/\/doi.org\/10.5281\/zenodo.18508037\">MultiClinCorpus subtask training set release<\/a><\/td><td>February 6, 2026<\/td><\/tr><tr><td><a href=\"https:\/\/zenodo.org\/records\/19098018\">MultiClinNER test set release<\/a> (only texts)<\/td><td>March 18, 2026<\/td><\/tr><tr><td>MultiClinNER test set prediction submissions<\/td><td><s>March 25, 2026<\/s> <span style=\"color: red;\">New: March 30, 2026<\/span><\/td><\/tr><tr><td><a href=\"https:\/\/zenodo.org\/records\/19334278\">MultiClinCorpus test set release<\/a> (only texts)<\/td><td>March 27, 2026<\/td><\/tr><tr><td>MultiClinCorpus test set prediction submissions<\/td><td>April 9, 2026<\/td><\/tr><tr><td>Result \/ evaluation returned to teams<\/td><td>April 14, 2026<\/td><\/tr><tr><td>Participant proceedings due<\/td><td>April 24, 2026<\/td><\/tr><tr><td>Notification of acceptance and participant proceedings reviews<\/td><td>May 15, 2026<\/td><\/tr><tr><td>Camera-ready papers due<\/td><td>May 25, 2026<\/td><\/tr><tr><td>ACL Proceedings due (hard deadline)<\/td><td>June 1, 2026<\/td><\/tr><tr><td>Workshop<\/td><td>July 2\u20133, 2026<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Related resources<\/h2>\n\n\n\n<p>At the NLP for Biomedical Information Analysis group (formerly Text Mining Unit), one of our missions is the open publication of datasets to train and benchmark biomedical information extraction, normalization and indexing systems. For that reason, we have released multiple datasets as part of shared tasks over the years. If you are interested in MultiClinAI, you might want to take a look at some of our resources and competitions about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Clinical content extraction: <a href=\"https:\/\/temu.bsc.es\/distemist\/\" target=\"_blank\" rel=\"noreferrer noopener\">DisTEMIST<\/a> (diseases), <a href=\"https:\/\/temu.bsc.es\/medprocner\/\">MedProcNER\/ProcTEMIST<\/a> (clinical procedures), <a href=\"https:\/\/temu.bsc.es\/symptemist\/\">SympTEMIST<\/a> (signs and findings), <a href=\"https:\/\/temu.bsc.es\/cantemist\/\" target=\"_blank\" rel=\"noreferrer noopener\">CANTEMIST<\/a> (tumour morphology), <a href=\"https:\/\/temu.bsc.es\/codiesp\/\" target=\"_blank\" rel=\"noreferrer noopener\">CodiEsp<\/a> (coding to ICD), <a href=\"https:\/\/temu.bsc.es\/pharmaconer\/\" target=\"_blank\" rel=\"noreferrer noopener\">PharmaCoNER<\/a> (chemicals and proteins), <a href=\"https:\/\/temu.bsc.es\/livingner\">LivingNER<\/a> (species and humans), <a href=\"https:\/\/temu.bsc.es\/multicardioner\/\">MultiCardioNER<\/a> (diseases and medications, includes the DrugTEMIST corpus as well as cardiology-specific data)<\/li>\n\n\n\n<li>Socio-demographic \/ Social Determinants of Health content extraction: <a href=\"https:\/\/temu.bsc.es\/meddoplace\/\" target=\"_blank\" rel=\"noreferrer noopener\">MEDDOPLACE<\/a> (locations and more) <a href=\"https:\/\/temu.bsc.es\/meddocan\/\" target=\"_blank\" rel=\"noreferrer noopener\">MEDDOCAN<\/a> (sensitive data), <a href=\"https:\/\/temu.bsc.es\/meddoprof\/\" target=\"_blank\" rel=\"noreferrer noopener\">MEDDOPROF<\/a> (occupations), <a href=\"https:\/\/temu.bsc.es\/toxhabits\">ToxHabits<\/a> (extraction of substance use-related content)<\/li>\n\n\n\n<li>Information extraction in social media: <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/socialdisner\/\" target=\"_blank\">SocialDisNER<\/a> (diseases), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/smm4h-spanish\/\" target=\"_blank\">ProfNER<\/a> (occupations)<\/li>\n\n\n\n<li>Linguistic aspects: <a href=\"https:\/\/temu.bsc.es\/BARR\/\" target=\"_blank\" rel=\"noreferrer noopener\">BARR1<\/a> and <a href=\"https:\/\/temu.bsc.es\/BARR2\/\" target=\"_blank\" rel=\"noreferrer noopener\">BARR2<\/a> (abbreviation resolution)<\/li>\n\n\n\n<li>Machine Translation: <a href=\"https:\/\/temu.bsc.es\/clinspen\/\" data-type=\"URL\" data-id=\"https:\/\/temu.bsc.es\/clinspen\/\" target=\"_blank\" rel=\"noreferrer noopener\">ClinSpEn<\/a> (EN&lt;-&gt;ES clinical content translation)<\/li>\n\n\n\n<li>Summarization: <a href=\"https:\/\/temu.bsc.es\/multiclinsum\">MultiClinSUM<\/a> (multilingual summarization of clinical content)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Contact<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Salvador Lima-L\u00f3pez<\/strong>, Barcelona Supercomputing Center (BSC), Spain: salvador.limalopez@gmail.com<\/li>\n\n\n\n<li><strong>Fernando Gallego-Donoso<\/strong>, Barcelona Supercomputing Center (BSC), Spain: fgallegodonoso@gmail.com<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MultiClinAI Shared Task Homepage The MultiClinAI Track is organized by the Barcelona Supercomputing Center\u2019s NLP for Biomedical Information Analysis group and promoted by European projects such as DataTools4Heart and AI4HF. What is MultiClinAI? MultiClinAI is a shared task focused on the creation of comparable multilingual corpora via annotation projection, as well as the multilingual extraction [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-46","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/pages\/46","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/comments?post=46"}],"version-history":[{"count":51,"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/pages\/46\/revisions"}],"predecessor-version":[{"id":270,"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/pages\/46\/revisions\/270"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/MultiClinAI\/wp-json\/wp\/v2\/media?parent=46"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}