{"id":66,"date":"2020-12-28T10:37:52","date_gmt":"2020-12-28T10:37:52","guid":{"rendered":"https:\/\/temu.bsc.es\/meddoprof\/?p=66"},"modified":"2021-11-22T15:21:17","modified_gmt":"2021-11-22T14:21:17","slug":"datasets","status":"publish","type":"post","link":"https:\/\/temu.bsc.es\/meddoprof\/datasets\/","title":{"rendered":"Datasets"},"content":{"rendered":"\n<p>The <strong>MEDDOPROF<\/strong> corpus has been randomly sampled into two subsets: train and test set. <\/p>\n\n\n\n<p>The complete dataset is available in <a href=\"https:\/\/doi.org\/10.5281\/zenodo.5070540\">Zenodo<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sample set<\/h3>\n\n\n\n<p>The sample set is composed of <strong>15 clinical cases<\/strong> extracted from the training set. In order to make the sample set somewhat representative of the corpus, we included cases from four different specialties: radiology, oncology, psychiatry and occupational health.<\/p>\n\n\n\n<p>Download the sample set from <a href=\"https:\/\/zenodo.org\/record\/4518733\" data-type=\"URL\" data-id=\"https:\/\/zenodo.org\/record\/4518733\">Zenodo<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training set<\/h3>\n\n\n\n<p>The training set is composed of 1500 clinical cases (~80% of the corpus). <\/p>\n\n\n\n<p>Download the training set from <a href=\"https:\/\/doi.org\/10.5281\/zenodo.4694768\">Zenodo<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Codes Reference List<\/h3>\n\n\n\n<p>For task 3 (MEDDOPROF-NORM), a reference list with all valid codes is provided. It is a .tsv file with three columns: code, label and alternative label. Codes from two sources are listed: ESCO and SNOMED-CT (these are preceded by the string &#8216;SCTID:&#8217; in the list). With a few exceptions, professions are mapped to ESCO, while working statuses and activities are mapped to SNOMED-CT.<\/p>\n\n\n\n<p>Download the codes reference list from <a href=\"https:\/\/zenodo.org\/record\/4722741\">Zenodo<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Test set<\/h3>\n\n\n\n<p>The test set is composed of 344 clinical cases (~20% of the corpus).<\/p>\n\n\n\n<p>Download the test set from <a href=\"https:\/\/doi.org\/10.5281\/zenodo.4889776\">Zenodo<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The MEDDOPROF corpus has been randomly sampled into two subsets: train and test set. The complete dataset is available in Zenodo. Sample set The sample set is composed of 15 clinical cases extracted from the training set. In order to make the sample set somewhat representative of the corpus, we included cases from four different [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-66","post","type-post","status-publish","format-standard","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/posts\/66","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":9,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/posts\/66\/revisions"}],"predecessor-version":[{"id":387,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/posts\/66\/revisions\/387"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/media?parent=66"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/categories?post=66"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temu.bsc.es\/meddoprof\/wp-json\/wp\/v2\/tags?post=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}