{"id":19,"date":"2025-01-29T11:00:55","date_gmt":"2025-01-29T11:00:55","guid":{"rendered":"https:\/\/temu.bsc.es\/multiclinsum\/?page_id=19"},"modified":"2025-06-04T08:52:13","modified_gmt":"2025-06-04T08:52:13","slug":"evaluation","status":"publish","type":"page","link":"https:\/\/temu.bsc.es\/multiclinsum\/evaluation\/","title":{"rendered":"Evaluation &amp; Submission"},"content":{"rendered":"\n<p><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\"><strong>Important Update:<\/strong> The deadline for result submission has been extended to <strong>4 June at 22:00 CET<\/strong>.<\/mark><\/p>\n\n\n\n<p><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">If you had issues in uploading your submission please contact the organizers (<a href=\"miguel.rod.bsc@gmail.com\" data-type=\"link\" data-id=\"miguel.rod.bsc@gmail.com\">Miguel Rodr\u00edguez<\/a>, <a href=\"edu4bsc@gmail.com\" data-type=\"link\" data-id=\"edu4bsc@gmail.com\">Eduard Rodr\u00edguez<\/a>, and <a href=\"krallinger.martin@gmail.com\" data-type=\"link\" data-id=\"krallinger.martin@gmail.com\">Martin Krallinger<\/a>) immediately. It seems the submission platform experienced some issues the final submission hour.<\/mark><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Evaluation<\/h1>\n\n\n\n<p>The evaluation will be done against a set of pairs of full-text clinical case reports and their human-generated summary with the same characteristics as the provided training documents. Two main metrics will be used: <strong><a href=\"https:\/\/pypi.org\/project\/rouge-score\/\" data-type=\"link\" data-id=\"https:\/\/pypi.org\/project\/rouge-score\/\">Rouge-L-Sum<\/a><\/strong> and <strong><a href=\"https:\/\/github.com\/Tiiiger\/bert_score\/blob\/master\/README.md\" data-type=\"link\" data-id=\"https:\/\/github.com\/Tiiiger\/bert_score\/blob\/master\/README.md\">BERTScore<\/a><\/strong>. Additional metrics not officially implemented for evaluation purposes might be also used.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Submission Instructions<\/strong><\/h2>\n\n\n\n<p><strong>Important information<\/strong> for MultiClinSum participants regarding submission instructions, please read carefully.<\/p>\n\n\n\n<p><strong>1. MulticlinSum Subtracks<\/strong><\/p>\n\n\n\n<p>Each sub-track of the MultiClinSum task is independent in the sense that submissions can be done independently for any of the four sub-tracks or languages. It is <strong>NOT mandatory to generate predictions or submissions for all languages<\/strong>, thus teams can also generate predictions only for a single language. Also important, for a given language, it is<strong> mandatory to generate predictions for all cases in the test set<\/strong>, rather than just a subset.<\/p>\n\n\n\n<p>Specifically, the four sub-tracks are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MultiClinSum-en<\/strong>: Clinical case summarization for content in <strong>English<\/strong>.<\/li>\n\n\n\n<li><strong>MultiClinSum-es<\/strong>: Clinical case summarization for content in <strong>Spanish<\/strong>.<\/li>\n\n\n\n<li><strong>MultiClinSum-fr<\/strong>: Clinical case summarization for content in <strong>French<\/strong>.<\/li>\n\n\n\n<li><strong>MultiClinSum-pt<\/strong>: Clinical case summarization for content in <strong>Portuguese<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>For the submission of your predictions or runs, make sure you have correctly specified the corresponding target language. In order to do so follow the predefined naming convention for your submissions.<\/p>\n\n\n\n<p><strong>2. MulticlinSum file submission naming convention<\/strong><\/p>\n\n\n\n<p>In order to make sure that the correspondence between generated summaries and full case reports is clear and unambiguous, the files corresponding to the summary of a given full-text clinical case report in the test set need to be identified with the extension \u201c<strong>_sum.txt\u201d, <\/strong>following the naming convention already provided in the training examples.<\/p>\n\n\n\n<p>For instance, for a the given English case report, denoted in the test set as <em><strong>multiclinsum_test_1_en.txt<\/strong><\/em>, the corresponding summary file name should be <em><strong>multiclinsum_test_1_en_sum.txt<\/strong><\/em>.<\/p>\n\n\n\n<p>So, for a case report with index <strong>i<\/strong> and language <strong>lang<\/strong>:<br>multiclinsum_test_<strong>{i}<\/strong>_<strong>{lang}<\/strong>.txt<\/p>\n\n\n\n<p>The generated txt file should be:<br>multiclinsum_test_<strong>{i}<\/strong>_<strong>{lang}_sum<\/strong>.txt<\/p>\n\n\n\n<p><strong>3. Number of allowed runs per  sub-track<\/strong><\/p>\n\n\n\n<p>For each sub-track (i.e. language) a total of 5 versions or runs are allowed. For instance, in a submission to MultiClinSum-en subtask a total of 5 different predictions for the entire test set can be submitted to the submission page. They will be evaluated independently and only the best will be selected for the leaderboard.<\/p>\n\n\n\n<p>You can also send only a single run, 2 runs, 3 runs, 4 runs or 5 runs in total. It is not required to send a total of 5 runs, we allow a total of 5 runs in case the participating team would like to try out different approaches, methods or settings.&nbsp;<\/p>\n\n\n\n<p>In order to send the predictions of a given sub-track and run, place the generated summary files into a single directory (following the naming conventions for generated summaries specified above). For us to identify each sub-track\/run combination correctly, add the language and the run number to the directory name. <\/p>\n\n\n\n<p>An example submission of 5 runs for the English MultiClinSum-en sub-track should contain the following folders each of them with the generated summaries:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>multiclinsum_en_run_1<\/strong> (corresponding to run 1)<\/li>\n\n\n\n<li><strong>multiclinsum_en_run_2<\/strong> (corresponding to run 2)<\/li>\n\n\n\n<li><strong>multiclinsum_en_run_3<\/strong> (corresponding to run 3)<\/li>\n\n\n\n<li><strong>multiclinsum_en_run_4<\/strong> (corresponding to run 4)<\/li>\n\n\n\n<li><strong>multiclinsum_en_run_5<\/strong> (corresponding to run 5)<\/li>\n<\/ul>\n\n\n\n<p>So, for any submission from team run <strong>r  <\/strong>and language <strong>lang<\/strong> the directory name convention will be:<\/p>\n\n\n\n<p><em>multiclinsum<\/em>_<strong>{lang}<\/strong>_<em>run<\/em>_<strong>{r}<\/strong>. <\/p>\n\n\n\n<p>Hence, you can submit up to 20 directories (assuming you use 5 runs for each of the 4 sub-tracks). You will place your final sumbissions collection into one folder that identifies you team group. Namely, the parent folder for a group with team name <strong>{team_name} <\/strong>should be:<\/p>\n\n\n\n<p><strong>{team_name}<\/strong><em>_multiclinsum<\/em><\/p>\n\n\n\n<p>This parent directory should in turn be compressed into a zip file, and will be the submission you will deliver. Please note that, since the BioASQ submission system only allows one single .zip file to be submitted, <strong>all the selected sub-tracks with their respective runs must be compressed into the same file<\/strong>.<\/p>\n\n\n\n<p>For the sake of clarity, this is how your submission would look like if you submitted 1 run for the Spanish sub-track, 2 runs for the english and french sub-track and 3 for the Portuguese sub-track:<\/p>\n\n\n\n<p><strong>&#8211; Files (for the first run of the english sumbission):<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"225\" src=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1024x225.png\" alt=\"\" class=\"wp-image-252\" srcset=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1024x225.png 1024w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-300x66.png 300w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-768x169.png 768w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image.png 1039w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>-Folders:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"816\" height=\"352\" src=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1.png\" alt=\"\" class=\"wp-image-253\" srcset=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1.png 816w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1-300x129.png 300w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-1-768x331.png 768w\" sizes=\"auto, (max-width: 816px) 100vw, 816px\" \/><\/figure>\n\n\n\n<p><strong>-Zip file:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1019\" height=\"91\" src=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-2.png\" alt=\"\" class=\"wp-image-254\" srcset=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-2.png 1019w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-2-300x27.png 300w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/06\/image-2-768x69.png 768w\" sizes=\"auto, (max-width: 1019px) 100vw, 1019px\" \/><\/figure>\n\n\n\n<p><strong>4. Test set summary prediction file format<\/strong><\/p>\n\n\n\n<p>The format of the test set prediction corresponds essentially to a simple plain text file of the generated summary, as was the case of the training set examples. You are responsible to make sure that the file can be correctly read using <em>utf-8<\/em> decoding standard.<\/p>\n\n\n\n<p><strong>5. Size of test set summary predictions texts&nbsp;<\/strong><\/p>\n\n\n\n<p>The test set submission you upload should not be bigger than the actual full text case reports, as they correspond to summaries. Thus as a sanity check, we recommend that you cross check that the summaries you return are not larger than the actual full clinical case reports.<\/p>\n\n\n\n<p><strong>6. Upload submissions at BioASQ platform<\/strong><\/p>\n\n\n\n<p>IMPORTANT: in order to upload your submissions or runs for the task you need to be registered at BioASQ. Once you are registered you can go to the submission page of BioASQ and upload your predictions. Remember they need to be in a zip folder and follow the naming conventions and details provided above.<\/p>\n\n\n\n<p><em>Use this <a href=\"https:\/\/participants-area.bioasq.org\/Tasks\/multiclinsum\">BioASQ Submission link<\/a> to submit or upload your predictions.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"879\" src=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-1024x879.png\" alt=\"\" class=\"wp-image-164\" srcset=\"https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-1024x879.png 1024w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-300x258.png 300w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-768x659.png 768w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-1536x1319.png 1536w, https:\/\/temu.bsc.es\/multiclinsum\/wp-content\/uploads\/2025\/05\/submission_webpage-2048x1759.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\">Screenshot of submission webpage at BioASQ platform.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Important Update: The deadline for result submission has been extended to 4 June at 22:00 CET. If you had issues in uploading your submission please contact the organizers (Miguel Rodr\u00edguez, Eduard Rodr\u00edguez, and Martin Krallinger) immediately. It seems the submission platform experienced some issues the final submission hour. Evaluation The evaluation will be done against [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-19","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/pages\/19","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/comments?post=19"}],"version-history":[{"count":61,"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/pages\/19\/revisions"}],"predecessor-version":[{"id":268,"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/pages\/19\/revisions\/268"}],"wp:attachment":[{"href":"https:\/\/temu.bsc.es\/multiclinsum\/wp-json\/wp\/v2\/media?parent=19"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}