{"id":4891,"date":"2022-05-03T14:05:59","date_gmt":"2022-05-03T13:05:59","guid":{"rendered":"https:\/\/temu.bsc.es\/distemist\/?p=4891"},"modified":"2022-05-09T16:26:33","modified_gmt":"2022-05-09T15:26:33","slug":"multilingual-corpus-cross-mappings","status":"publish","type":"post","link":"https:\/\/temu.bsc.es\/distemist\/multilingual-corpus-cross-mappings\/","title":{"rendered":"Multilingual corpus &#038; cross-mappings"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Annotation guidelines will be available in\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/doi.org\/10.5281\/zenodo.6458078\" target=\"_blank\">Zenodo<\/a>.<\/p><p>DISTEMIST training, test and background sets (including the multilingual corpus and cross-mappings) are available at\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/doi.org\/10.5281\/zenodo.6408476\" target=\"_blank\">Zenodo<\/a>.<\/p><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">DISTEMIST Multilingual corpus<\/h2>\n\n\n\n<p>We have generated the annotated (and normalized to Snomed-CT) training and validation sets in 6 languages: English, Portuguese, Catalan, Italian, French, and Romanian.&nbsp;The process was:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The&nbsp;text files were translated with a neural machine translation system.<\/li><li>The annotations were translated with the same&nbsp;neural machine translation system.<\/li><li>The translated annotations were transferred to the translated&nbsp;text files using an annotation transfer technology.<\/li><\/ol>\n\n\n\n<p>If you want to visualize the multilingual resources, check out this Brat server:&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/mDistemist\/#\/translations\/\" target=\"_blank\">https:\/\/temu.bsc.es\/mDistemist\/#\/translations\/<\/a><br>For instance, you can see the parallel annotations&nbsp;in&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/temu.bsc.es\/mDistemist\/diff.xhtml#\/translations\/fr\/train\/es-S0004-06142008000100011-1?diff=\/translations\/en\/train\/\" target=\"_blank\">English vs&nbsp;in French<\/a>, or in&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/temu.bsc.es\/mDistemist\/diff.xhtml#\/translations\/it\/train\/S0004-06142005000500011-1?diff=\/gold-standard\/train\/\" target=\"_blank\">Spanish (the gold standard) vs in Italian.<\/a><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-1024x576.png\" alt=\"\" class=\"wp-image-4892\" srcset=\"https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-1024x576.png 1024w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-300x169.png 300w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-768x432.png 768w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-1536x864.png 1536w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Distemist_multilingual-1-2048x1152.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Multilingual annotated and normalized corpus process overview<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"942\" height=\"792\" src=\"https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Screenshot-from-2022-05-02-17-24-42.png\" alt=\"\" class=\"wp-image-4877\" srcset=\"https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Screenshot-from-2022-05-02-17-24-42.png 942w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Screenshot-from-2022-05-02-17-24-42-300x252.png 300w, https:\/\/temu.bsc.es\/distemist\/wp-content\/uploads\/2022\/05\/Screenshot-from-2022-05-02-17-24-42-768x646.png 768w\" sizes=\"auto, (max-width: 942px) 100vw, 942px\" \/><figcaption>Gold Standard (Spanish) vs English annotations visualized with Brat.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>DISTEMIST cross-mappings<\/strong><\/h2>\n\n\n\n<p>The DISTEMIST Gold Standard contains the mentions mapped to Snomed-CT.<\/p>\n\n\n\n<p>In the DISTEMIST cross-mappings files we include the same entities as in DISTEMIST-linking but mapped to Snomed-CT,&nbsp;<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/mesh\/\">MeSH<\/a>,&nbsp;<a href=\"https:\/\/icd.who.int\/browse10\/2019\/en#\/\">ICD-10<\/a>,&nbsp;<a href=\"https:\/\/hpo.jax.org\/\">HPO<\/a>, and&nbsp;<a href=\"https:\/\/www.omim.org\/\">OMIM<\/a>. The original mappings are manual and to Snomed-CT. The mappings to the other terminologies&nbsp;were done through the UMLS Metathesaurus.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Annotation guidelines will be available in\u00a0Zenodo. DISTEMIST training, test and background sets (including the multilingual corpus and cross-mappings) are available at\u00a0Zenodo. DISTEMIST Multilingual corpus We have generated the annotated (and normalized to Snomed-CT) training and validation sets in 6 languages: English, Portuguese, Catalan, Italian, French, and Romanian.&nbsp;The process was: The&nbsp;text files were translated with a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-4891","post","type-post","status-publish","format-standard","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/posts\/4891","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/comments?post=4891"}],"version-history":[{"count":5,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/posts\/4891\/revisions"}],"predecessor-version":[{"id":4916,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/posts\/4891\/revisions\/4916"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/media?parent=4891"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/categories?post=4891"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temu.bsc.es\/distemist\/wp-json\/wp\/v2\/tags?post=4891"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}