{"id":3999,"date":"2019-09-19T12:53:22","date_gmt":"2019-09-19T12:53:22","guid":{"rendered":"http:\/\/temu.bsc.es\/meddocan\/?p=3999"},"modified":"2022-06-29T13:14:14","modified_gmt":"2022-06-29T13:14:14","slug":"datasets","status":"publish","type":"post","link":"https:\/\/temu.bsc.es\/socialdisner\/datasets\/","title":{"rendered":"Datasets"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\">Train set<\/h4>\n\n\n\n<p> The train set contains  5,000 annotated tweets. <a href=\"https:\/\/doi.org\/10.5281\/zenodo.6359365\">Will be published on zenodo<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Validation set<\/h4>\n\n\n\n<p> The validation set contains  2500 annotated tweets. <a href=\"https:\/\/doi.org\/10.5281\/zenodo.6359365\">Will be published on zenodo<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Test and background sets<\/h4>\n\n\n\n<p>The test set contains  2500 tweets. The background set contains 50K tweets. <a href=\"https:\/\/doi.org\/10.5281\/zenodo.6359365\">Will be published on zenodo.<\/a><\/p>\n\n\n\n<p>The test and background set will be published together. You will have to submit predictions for the whole set, but you will only be evaluated with the test set `predictions.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Test set&nbsp;with&nbsp;Gold&nbsp;Standard&nbsp;annotations<\/h4>\n\n\n\n<p>The Gold Standard annotations of the test set will be released <strong>after<\/strong> the submission deadline<\/p>\n\n\n\n<p><strong>Corpora Stats.<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>&nbsp;<\/td><td><strong>Training<\/strong><\/td><td><strong>Development<\/strong><\/td><\/tr><tr><td># Tweets<\/td><td>5000<\/td><td>2500<\/td><\/tr><tr><td># characters<\/td><td>1253431<\/td><td>516768<\/td><\/tr><tr><td># tokens<\/td><td>211555<\/td><td>84478<\/td><\/tr><tr><td>Avg. char. \/tweet<\/td><td>250.69<\/td><td>206.71<\/td><\/tr><tr><td>Avg. Tok. \/tweet<\/td><td>42.31<\/td><td>33.79<\/td><\/tr><tr><td># disease mentions<\/td><td>15173<\/td><td>4252<\/td><\/tr><tr><td># unique disease mentions<\/td><td>4407<\/td><td>1413<\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Train set The train set contains 5,000 annotated tweets. Will be published on zenodo. Validation set The validation set contains 2500 annotated tweets. Will be published on zenodo. Test and background sets The test set contains 2500 tweets. The background set contains 50K tweets. Will be published on zenodo. The test and background set will [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-3999","post","type-post","status-publish","format-standard","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/posts\/3999","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/comments?post=3999"}],"version-history":[{"count":6,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/posts\/3999\/revisions"}],"predecessor-version":[{"id":4847,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/posts\/3999\/revisions\/4847"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/media?parent=3999"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/categories?post=3999"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temu.bsc.es\/socialdisner\/wp-json\/wp\/v2\/tags?post=3999"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}