Train set

The train set contains 5,000 annotated tweets. Will be published on zenodo.

Validation set

The validation set contains 2500 annotated tweets. Will be published on zenodo.

Test and background sets

The test set contains 2500 tweets. The background set contains 50K tweets. Will be published on zenodo.

The test and background set will be published together. You will have to submit predictions for the whole set, but you will only be evaluated with the test set `predictions.

Test set with Gold Standard annotations

The Gold Standard annotations of the test set will be released after the submission deadline

Corpora Stats.

# Tweets50002500
# characters1253431516768
# tokens21155584478
Avg. char. /tweet250.69206.71
Avg. Tok. /tweet42.3133.79
# disease mentions151734252
# unique disease mentions44071413