Downloadable datasets

Important: Teams interested in the BARR task should register here to access datasets, submit predictions, get track updates and evaluate their systems.

Background sets and test set

There are two background sets available for participants. The first background set is formed with 112728 publications from different sources. The second background set is formed by Elsevier documents, with a total amount of 41760 documents, and the first background set together.

Participants are free to use the first background set for their own purpose.

Participants using the second set will be required to accept specific use terms and conditions.

From the first background set, a total of 600 documents are randomly selected to build the test set. The documents that belong to the test set will be unknown to the participants until all participants have submitted their runs.

Test set

The test set contains 20000 abstract of the background set, with the 600 abstracts that will be used for evaluation included, and some new abstract absent in the background set. All participants must send the predictions from this set.

Background set predictions

Here are our predictions of 3 different submissions. Explanations of each submission can be found later in this page, just take a look at the name of the tool. You can use these submissions as additional resources:

  • BARR Background Subset 1 Predictions: Zip file
  • BARR Background Subset 2 Predictions: Zip file

Evaluation will be available soon.

Sample predictions

PDF link

This section describes the different open-source tools we used to detect abbreviations in the corpus in order to create the baselines of the sample set. These 3 tools work using simple regular expression rules to detect abbreviations and their long forms in the text. None of them specifies the offsets of the long and short forms.

These tools have been previously tested in English corpora, but they work well with Spanish biomedical publications. None of them uses internal abbreviation dictionaries.

To get the results, it is recommended to use a sentence splitter beforehand, and apply the following tools sentence by sentence. We used IXA pipes to split sentences.

  • Ab3P:
    Ab3P (Abbreviation Plus Pseudo-Precision) is a simple tool developed in C++, and the compilation process is quite simple.
    The software outputs short forms and their long forms detected in the sentence, together with the estimated precision.
  • ADRS
    ADRS (Abbreviation Definition Recognition Software) is another simple tool developed in Java.
    To make use of this software, you just need to pass the file’s path you want to analyze. The system will return short and long form pairs.
    BADREX (Biomedical Abbreviation Expander) is a GATE plugin developed in Java.
    Although you need may GATE to run the plugin, it is possible to make use of it with the API. The software extracts both short forms and their corresponding long forms.

To get these baselines, we just executed the tools and adapted the outputs to the track’s evaluation format.

The evaluation of these baselines was done with the sample set. We will follow a the same process for the final testing set.

You can download all sample set prediction files from this link. Please check the documents above for entity and relation evaluation instructions at Markyt.

Entity evaluation

For the entity evaluation track, we extracted the long and short forms detected by each tool. These tools return long and short form pair, so we just took the entities found in these pairs. Later, we detected the positions of each entity in the titles and abstracts, and extracted the offsets. Finally we assigned the LONG or SHORT category, specified in the outputs. Entities that are not part of any SF-LF relationships are labeled as MULTIPLE.

Tool Precision Recall F-Measure
Ab3P 78.20
39.87 52.81
ADRS 70.75 49.02 57.91
BADREX 72.50 37.91 49.78

Relation evaluation

For the relation evaluation track, we extracted the long and short form pairs detected by each tool. Once we had them, we analyzed the titles and abstracts to get the offsets of each entity, and finally created the file to evaluate.

We consider a SF-LF pair those which are very close to each other. In other words, both short and long form should be participating in the same context. If we find the long and short form at the beginning of the document, we make pairs with them; meanwhile, if the short form appears once again later in the document, in another sentence, we do not make pairs between the second short form and the long form.

None of these tools detect NESTED relations.

Tool Precision Recall F-Measure
Ab3P 71.79 34.14 46.28
ADRS 62.26 40.24 48.89
BADREX 52.38 26.83 35.48