Train set
The train set contains 5,000 annotated tweets. Will be published on zenodo.
Validation set
The validation set contains 2500 annotated tweets. Will be published on zenodo.
Test and background sets
The test set contains 2500 tweets. The background set contains 50K tweets. Will be published on zenodo.
The test and background set will be published together. You will have to submit predictions for the whole set, but you will only be evaluated with the test set `predictions.
Test set with Gold Standard annotations
The Gold Standard annotations of the test set will be released after the submission deadline
Corpora Stats.
Training | Development | |
# Tweets | 5000 | 2500 |
# characters | 1253431 | 516768 |
# tokens | 211555 | 84478 |
Avg. char. /tweet | 250.69 | 206.71 |
Avg. Tok. /tweet | 42.31 | 33.79 |
# disease mentions | 15173 | 4252 |
# unique disease mentions | 4407 | 1413 |