{"id":43,"date":"2022-04-01T09:17:47","date_gmt":"2022-04-01T09:17:47","guid":{"rendered":"https:\/\/temu.bsc.es\/clinspen\/?page_id=43"},"modified":"2023-03-09T12:14:40","modified_gmt":"2023-03-09T12:14:40","slug":"clinspen","status":"publish","type":"page","link":"https:\/\/temu.bsc.es\/clinspen\/","title":{"rendered":"ClinSpEn"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\"><strong>ClinSpEn<\/strong><\/h1>\n\n\n\n<p><\/p>\n\n\n\n<p>This website contains the data for the <strong>ClinSpEn<\/strong> shared task, focused on <strong>clinical EN-ES machine translation<\/strong> and part of the <strong>biomedical task of<\/strong> <strong><a rel=\"noreferrer noopener\" href=\"https:\/\/statmt.org\/wmt22\/\" target=\"_blank\">WMT 2022<\/a><\/strong>.<\/p>\n\n\n\n<p class=\"has-pale-cyan-blue-background-color has-background\">The Gold Standard test sets for every dataset have been released. They are available on <a href=\"https:\/\/doi.org\/10.5281\/zenodo.6497350\" target=\"_blank\" rel=\"noreferrer noopener\">Zenodo<\/a>.<\/p>\n\n\n\n<p class=\"has-luminous-vivid-amber-background-color has-background\">The BioWMT 2022 overview paper is now available! You can find it <a rel=\"noreferrer noopener\" href=\"https:\/\/statmt.org\/wmt22\/pdf\/2022.wmt-1.69.pdf\" data-type=\"URL\" data-id=\"https:\/\/statmt.org\/wmt22\/pdf\/2022.wmt-1.69.pdf\" target=\"_blank\">here<\/a>. If you use any of the ClinSpEn data, please remember to cite it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Motivation<\/h3>\n\n\n\n<p>Machine translation applied to the clinical domain is a specially challenging task due to the complexity of medical language and the heavy use of health-related technical terms and medical expressions. Therefore there is a large community of specialized medical translators, able to deal with medical narratives, terminologies or&nbsp;the use of ambiguous&nbsp;abbreviations and acronyms.&nbsp;<\/p>\n\n\n\n<p>Taking into account the relevance, impact and diversity of health-related content, as well as the rapidly growing number of publications, EHRs, clinical trials,&nbsp; informed consent documents and medical terminologies there is a pressing need to be able to generate more robust medical machine translation resources together with independent quality evaluation scenarios.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Recent advances in machine translation technologies together with the use of other NLP components are showing promising results, thus domain adaptation of MT approaches can have a significant impact in unlocking key information from medical content.<\/p>\n\n\n\n<p>Therefore, the ClinSpEn data represents three different types of data very relevant to the biomedical domain: clinical cases, clinical terminology and ontology concepts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sub-tracks<\/h3>\n\n\n\n<p>All in all, ClinSpEn is comprised of three different sub-tracks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>ClinSpEn-CC <em>(clinical cases)<\/em><\/strong>: EN&gt;ES translation of clinical cases using a collection of 202 parallel COVID-19 clinical case reports.<\/li>\n\n\n\n<li><strong>ClinSpEn-CT <em>(clinical terms)<\/em><\/strong>: ES&gt;EN translation of clinical terminology using a collection of over 19 000 parallel terms obtained from biomedical literature and electronic health records. <\/li>\n\n\n\n<li><strong>ClinSpEn-OC <em>(ontology concepts)<\/em><\/strong>: EN&gt;ES translation of a collection of over 2 000 parallel concepts obtained from different biomedical ontologies. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"599\" src=\"https:\/\/temu.bsc.es\/clinspen\/wp-content\/uploads\/2022\/08\/clinspen-overview-2-1024x599.png\" alt=\"\" class=\"wp-image-220\" srcset=\"https:\/\/temu.bsc.es\/clinspen\/wp-content\/uploads\/2022\/08\/clinspen-overview-2-1024x599.png 1024w, https:\/\/temu.bsc.es\/clinspen\/wp-content\/uploads\/2022\/08\/clinspen-overview-2-300x176.png 300w, https:\/\/temu.bsc.es\/clinspen\/wp-content\/uploads\/2022\/08\/clinspen-overview-2-768x450.png 768w, https:\/\/temu.bsc.es\/clinspen\/wp-content\/uploads\/2022\/08\/clinspen-overview-2.png 1488w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>All documents and terms in the ClinSpEn collection have been manually translated and revised by professional medical translators in order to ensure the quality and validity of the data. <\/p>\n\n\n\n<p>ClinSpEn is organized by the Barcelona Supercomputing Center&#8217;s NLP for Biomedical Information Analysis group (formerly Text Mining Unit). <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Related resources<\/h2>\n\n\n\n<p>At the NLP for Biomedical Information Analysis group, one of our missions is the open publication of datasets to train and benchmark biomedical information extraction, normalization and indexing systems. For that reason, we have released multiple datasets as part of shared tasks over the years. If you are interested in ClinSpEn, you might want to take a look at some of our resources and competitions about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Clinical content extraction: <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/medprocner\" target=\"_blank\">MedProcNER<\/a> (clinical procedures &#8212; new this year!), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/distemist\/\" target=\"_blank\">DisTEMIST<\/a> (diseases), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/cantemist\/\" target=\"_blank\">CANTEMIST<\/a> (tumour morphology), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/codiesp\/\" target=\"_blank\">CodiEsp<\/a> (coding to ICD), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/pharmaconer\/\" target=\"_blank\">PharmaCoNER<\/a> (chemicals and proteins)<\/li>\n\n\n\n<li>Sociodemographic content extraction: <a href=\"https:\/\/temu.bsc.es\/meddoplace\" target=\"_blank\" rel=\"noreferrer noopener\">MEDDOPLACE<\/a> (locations, clinical departments and related info &#8212; new this year!), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/meddocan\/\" target=\"_blank\">MEDDOCAN<\/a> (sensitive data), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/meddoprof\/\" target=\"_blank\">MEDDOPROF<\/a> (occupations)<\/li>\n\n\n\n<li>Information extraction in social media: <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/socialdisner\/\" target=\"_blank\">SocialDisNER<\/a> (diseases), <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/smm4h-spanish\/\" target=\"_blank\">ProfNER<\/a> (occupations)<\/li>\n\n\n\n<li>Linguistic aspects: <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/BARR\/\" target=\"_blank\">BARR1<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/BARR2\/\" target=\"_blank\">BARR2<\/a> (abbreviation resolution)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>ClinSpEn This website contains the data for the ClinSpEn shared task, focused on clinical EN-ES machine translation and part of the biomedical task of WMT 2022. The Gold Standard test sets for every dataset have been released. They are available on Zenodo. The BioWMT 2022 overview paper is now available! You can find it here. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-43","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/pages\/43","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/comments?post=43"}],"version-history":[{"count":28,"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/pages\/43\/revisions"}],"predecessor-version":[{"id":275,"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/pages\/43\/revisions\/275"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/clinspen\/wp-json\/wp\/v2\/media?parent=43"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}