Biomedical Abbreviation Recognition and Resolution 2nd Edition (BARR2)

The testing set is composed with 220 clinical cases. The following files are available for download:

· Clinical cases raw text (testing set): This file contains the raw text of the testing set articles of each clinical case in TXT format.

· Clinical cases metadata (testing set): This file contains basic information about each record, such as the publication date and journal information. We also provide the link to the complete clinical case. The complete text of the record may contain several clinical cases, so we have splitted each clinical case into separate files; Case_ID indicates the position of this clinical case in the original document.

· Sub-track 1 (Short Form-Long Form track, testing set): This file contains the annotations for the first sub-track. First column displays the clinical case identifier. Columns 2-5 contain information (mention, mention type and offsets) about the first argument of the relation (abbreviation mentioned in text). Column 6 indicates the relation type between both arguments. And columns 7-10 contain information about the second argument of the relation (definition of the abbreviation, explicitly mentioned in text).

· Sub-track 2 (Abbreviation resolution track, testing set): This file contains the annotations for the second sub-track. First column displays the clinical case identifier. Columns 2-3 indicate the position (offsets) of the abbreviation in text. Column 4 shows the exact mention of the abbreviation in text. Finally, columns 5 and 6 show the definition of the abbreviation.

The background set is composed with 2879 clinical cases. Participants must execute their systems in this set for both sub-tracks, and send their prediction files for all clinical cases present here. From all these cases, 220 clinical cases will be used for the final evaluation. The following files are available for download:

· Clinical cases raw text (background and set set): This file contains the raw text of the background set articles of each clinical case in TXT format.

· Clinical cases metadata (background and test set): This file contains basic information about each record, such as the publication date and journal information. We also provide the link to the complete clinical case. The complete text of the record may contain several clinical cases, so we have splitted each clinical case into separate files; Case_ID indicates the position of this clinical case in the original document.

From all received runs, we will only use for evaluation purposes the subset of predictions corresponding to the 220 test set Gold Standard abstracts. We ask participants to run their systems against the entire test * background set of clinical cases documents. A total of 5 runs are allowed per team for each of the two BARR2 subtasks. Please send your prediction files to ander.intxaurrondo@bsc.es before 10th of June, 2018, at 23:59 CET.

We will make the testing set public the day after the evaluation results have been sent to participants. We will list the chosen clinical cases in the background set, together with their metadata, and annotation files for both sub-tracks.

The development set is composed with 146 clinical cases. The following files are available for download:

· Clinical cases raw text (development set): This file contains the raw text of the development set articles of each clinical case in TXT format.

· Clinical cases metadata (development set): This file contains basic information about each record, such as the publication date and journal information. We also provide the link to the complete clinical case. The complete text of the record may contain several clinical cases, so we have splitted each clinical case into separate files; Case_ID indicates the position of this clinical case in the original document.

· Sub-track 1 (Short Form-Long Form track, development set): This file contains the annotations for the first sub-track. First column displays the clinical case identifier. Columns 2-5 contain information (mention, mention type and offsets) about the first argument of the relation (abbreviation mentioned in text). Column 6 indicates the relation type between both arguments. And columns 7-10 contain information about the second argument of the relation (definition of the abbreviation, explicitly mentioned in text).

· Sub-track 2 (Abbreviation resolution track, development set): This file contains the annotations for the second sub-track. First column displays the clinical case identifier. Columns 2-3 indicate the position (offsets) of the abbreviation in text. Column 4 shows the exact mention of the abbreviation in text. Finally, columns 5 and 6 show the definition of the abbreviation.

The training set is composed with 318 clinical cases. The following files are available for download:

· Clinical cases raw text (training set): This file contains the raw text of the training set articles of each clinical case in TXT format.

· Clinical cases metadata (training set): This file contains basic information about each record, such as the publication date and journal information. We also provide the link to the complete clinical case. The complete text of the record may contain several clinical cases, so we have splitted each clinical case into separate files; Case_ID indicates the position of this clinical case in the original document.

· Sub-track 1 (Short Form-Long Form track, training set): This file contains the annotations for the first sub-track. First column displays the clinical case identifier. Columns 2-5 contain information (mention, mention type and offsets) about the first argument of the relation (abbreviation mentioned in text). Column 6 indicates the relation type between both arguments. And columns 7-10 contain information about the second argument of the relation (definition of the abbreviation, explicitly mentioned in text).

· Sub-track 2 (Abbreviation resolution track, training set): This file contains the annotations for the second sub-track. First column displays the clinical case identifier. Columns 2-3 indicate the position (offsets) of the abbreviation in text. Column 4 shows the exact mention of the abbreviation in text. Finally, columns 5 and 6 show the definition of the abbreviation. Updated 2018/05/24.

The sample set is composed with 15 clinical cases. The following files are available for download:

· Clinical cases raw text (sample set): This file contains the raw text of the sample set articles of each clinical case in TXT format.

· Clinical cases metadata (sample set): This file contains basic information about each record, such as the publication date and journal information. We also provide the link to the complete clinical case. The complete text of the record may contain several clinical cases, so we have splitted each clinical case into separate files; Case_ID indicates the position of this clinical case in the original document.

· Sub-track 1 (Short Form-Long Form track, sample set): This file contains the annotations for the first sub-track. First column displays the clinical case identifier. Columns 2-5 contain information (mention, mention type and offsets) about the first argument of the relation (abbreviation mentioned in text). Column 6 indicates the relation type between both arguments. And columns 7-10 contain information about the second argument of the relation (definition of the abbreviation, explicitly mentioned in text).

· Sub-track 2 (Abbreviation resolution track, sample set): This file contains the annotations for the second sub-track. First column displays the clinical case identifier. Columns 2-3 indicate the position (offsets) of the abbreviation in text. Column 4 shows the exact mention of the abbreviation in text. Finally, columns 5 and 6 show the definition of the abbreviation.

BARR Training Subset 1

BARR Training Subset 2

BARR Background Subset

BARR Test Set

Ander Intxaurrondo
ander.intxaurrondo[AT]bsc.es