Datasets

by livingner

Download the training, validation, test and background sets from Zenodo.

The LivingNER corpus has been randomly sampled into three subsets: train, development and test set.

Sample set

The sample set is composed of 5 clinical cases extracted from the training set from four different specialties: COVID, oncology, infectious diseases and tropical medicine.

Download the sample set from Zenodo.

Training set

The training set is composed of 1000 clinical cases from many different specialties: COVID, oncology, infectious diseases, tropical medicine, etc.

Download the training set from Zenodo.

Development set

The training set is composed of 500 clinical cases from many different specialties: COVID, oncology, infectious diseases, tropical medicine, urology, allergology, etc.

Download the validation set from Zenodo.

Test set

The test set is composed of clinical cases from many different specialties. It is released WITHOUT ANNOTATIONS. The goal of the task is to generate automatic annotations for the test set documents.

The test set is released together with a large collection of clinical case reports (background set), to avoid manual annotations.

Download the test+background set from Zenodo.

Publications

Participant papers at CEUR: http://ceur-ws.org/Vol-3202/

Instructions for the working notes papers:

  • System description papers should be formatted according to the uniform 1-column CEURART style. Latex and Word templates can be found in: http://ceur-ws.org/HOWTOSUBMIT.html#PREPARE
  • The minimum length of a regular paper should be 5 pages. There is no maximum page limit.
  • Papers must be written in English.
  • Each paper must include a copyright footnote on the first page of each paper: {\let\thefootnote\relax\footnotetext{Copyright \textcopyright\ 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IberLEF 2022, September 2022, A Coruña, Spain.}} 
  • Eliminate the numbering in the pages of the paper, if there is one, and make sure that there are no headers or footnotes, except the mandatory copyright as a footnote on the first page.
  • Authors should be described with their name and their full affiliation (university and country). Names must be complete (no initials), e.g. “Soto Pérez” instead of “S. Pérez”.
  • Titles of papers should be in emphatic capital English notation, i.e., “Filling an Author Agreement by Autocompletion” rather than “Filling an author agreement by autocompletion”.
  • At least one author of each paper must sign the CEUR copyright agreement. Instructions and templates can be found at http://ceur-ws.org/HOWTOSUBMIT.html. The signed form must be sent along with the paper to the task organizers. Important: it must be physically signed with pencil on paper.

Papers without the copyright footnote, with page numbers, without the CEUR copyright agreement properly signed will not be considered.

Submit your paper at EasyChair. See here the submission procedure

Have a look at 2021 Meddoprof proceedings (here) and 2020 Cantemist proceedings (here).

Relevant publications

  • Gerner, Martin, Goran Nenadic, and Casey M. Bergman. “LINNAEUS: a species name identification system for biomedical literature.” BMC bioinformatics 11.1 (2010): 1-17.
  • Federhen, Scott. “The NCBI taxonomy database.” Nucleic acids research 40.D1 (2012): D136-D143.
  • Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, and Lars Juhl Jensen. 2013. The species and organisms resources for fast and accurate identification of taxonomic names in text. PLOS ONE, 8(6):1–6
  • Schoch, Conrad L et al. “NCBI Taxonomy: a comprehensive update on curation, resources and tools.” Database : the journal of biological databases and curation vol. 2020 (2020): baaa062. doi:10.1093/database/baaa062
  • Antonio Miranda-Escalada, Eulàlia Farré-Maduell, Martin Krallinger. Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings. 303-323 (2020).
  • Lima-López, Salvador, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias, & Martin Krallinger. “NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts.” Procesamiento del Lenguaje Natural [Online], 67 (2021): 243-256.
  • Antonio Jimeno Yepes, Ameer Albahem, and Karin Verspoor. 2021. Using Discourse Structure to Differentiate Focus Entities from Background Entities in Scientific Literature. In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pages 174–178, Online. Australasian Language Technology Association.
  • Pyysalo, Sampo, et al. “Overview of the infectious diseases (ID) task of BioNLP shared task 2011.” Proceedings of BioNLP Shared Task 2011 Workshop. 2011.

Example

Evaluation process: (1) you submit your results, (2) we perform the evaluation off-line and (3) return the final scores.

  • LivingNER-Species NER
$ cd src
$ python main.py -g ../gs-data/sample_entities_subtask1.tsv -p ../toy-data/sample_entities_subtask1_MISSING_ONE_FILE.tsv -s ner
According to file headers, you are on subtask ner
According to file headers, you are on subtask ner

-----------------------------------------------------
Clinical case name			Precision
-----------------------------------------------------
32032497_ES		nan
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Clinical case name			Recall
-----------------------------------------------------
32032497_ES		0.0
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Clinical case name			F-score
-----------------------------------------------------
32032497_ES		nan
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Micro-average metrics
-----------------------------------------------------

Micro-average precision = 1.0


Micro-average recall = 0.9568


Micro-average F-score = 0.9779

../toy-data/sample_entities_subtask1_MISSING_ONE_FILE.tsv|1.0|0.9568|0.9779
  • LivingNER-Species Norm
$ cd src
$ python main.py -g ../gs-data/sample_entities_subtask2.tsv -p ../toy-data/sample_entities_subtask2_predictions.tsv -s norm
According to file headers, you are on subtask norm, GS file
According to file headers, you are on subtask norm, predictions file
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/ann_parsing.py:46: UserWarning: There are duplicated entries in ../toy-data/sample_entities_subtask2_predictions.tsv. Keeping just the first one...
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/ann_parsing.py:59: UserWarning: Lines 1 in ../toy-data/sample_entities_subtask2_predictions.tsv contain unvalid codes. Valid codes are those that appear in ../ncbi_codes_unique.tsv. Ignoring lines with valid codes...

-----------------------------------------------------
Clinical case name			Precision
-----------------------------------------------------
32032497_ES		0.5
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Clinical case name			Recall
-----------------------------------------------------
32032497_ES		0.3333
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Clinical case name			F-score
-----------------------------------------------------
32032497_ES		0.4
-----------------------------------------------------
caso_clinico_medtropical54		1.0
-----------------------------------------------------
casos_clinicos_infecciosas1		1.0
-----------------------------------------------------
casos_clinicos_infecciosas141		1.0
-----------------------------------------------------
cc_onco908		1.0
-----------------------------------------------------

-----------------------------------------------------
Micro-average metrics
-----------------------------------------------------

Micro-average precision = 0.9854


Micro-average recall = 0.9712


Micro-average F-score = 0.9783

../toy-data/sample_entities_subtask2_predictions.tsv|0.9854|0.9712|0.9783
  • LivingNER-Clinical IMPACT
$ cd src
$ python main.py -g ../gs-data/sample_subtask3.tsv -p ../toy-data/sample_subtask3_predictions.tsv -s app
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/ann_parsing.py:99: UserWarning: Lines 5 in ../toy-data/sample_subtask3_predictions.tsv contain unvalid codes. Valid codes are those that appear in ../ncbi_codes_unique.tsv. Ignoring lines with valid codes...
Basic metrics (not taking into account NCBI codes, just Y/N assignment)
-----------------------------------------------------
Pet
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
AnimalInjury
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
Food
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
Nosocomial
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/livingner_app.py:90: UserWarning: Precision score automatically set to zero because there are no predicted positives
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/livingner_app.py:104: UserWarning: Global F1 score automatically set to zero for simple metrics to avoid division by zero
/home/antonio/Documents/Work/BSC/Projects/micro/scripts/livingner-evaluation-library/src/livingner_app.py:110: UserWarning: Global F1 score automatically set to zero for complex metrics to avoid division by zero
Precision = 0
Recall = 0.0
F1score = 0
-----------------------------------------------------



Complex metrics (taking into account NCBI codes)
-----------------------------------------------------
Pet
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
AnimalInjury
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
Food
Precision = 1.0
Recall = 1.0
F1score = 1.0
-----------------------------------------------------
Nosocomial
Precision = 0
Recall = 0.0
F1score = 0
-----------------------------------------------------

Evaluation Library

The LivingNER evaluation library is available on GitHub.

Please, make sure you have the latest version.

Evaluation process: (1) you submit your results, (2) we perform the evaluation off-line and (3) return the final scores.

These scripts are distributed as part of the LivingNER shared task. They are written in Python3 and intended to be run via command line:

$> python main.py -g ../gs-data/sample_entities_subtask1.tsv -p ../toy-data/sample_entities_subtask1_MISSING_ONE_FILE.tsv -s ner
$> python main.py -g ../gs-data/sample_entities_subtask2.tsv -p ../toy-data/sample_entities_subtask2_MISSING_ONE_FILE.tsv -s norm 
$> python main.py -g ../gs-data/sample_entities_subtask3.tsv -p ../toy-data/pred_sample_entities_subtask3.tsv -s app

They produce the evaluation metrics for the corresponding sub-tracks: precision, recall and F-score for LivingNER Species NER and LivingNER Species NORM.

Submission

Evaluation process: (1) you submit your results, (2) we perform the evaluation off-line and (3) return the final scores.

Submission format

LivingNER – Species NER track. A TSV file with one row per mention, headers and the following columns:

  • filename
  • mark
  • label (mention label)
  • off0 (starting character offset)
  • off1 (ending character offset)
  • span (mention span)
Figure 1. Example of submission file for LivingNER-Species NER track

LivingNER – Species Norm track. A TSV file with one row per mention, headers and the following columns:

  • filename
  • mark
  • label (mention label)
  • off0 (starting character offset)
  • off1 (ending character offset)
  • span (mention span)
  • NCBITax (mention code in NCBITaxonomy)
Figure 2. Example of submission file for LivingNER-Species Norm track

LivingNER – Clinical IMPACT track. A TSV file with one row per document, headers and the following columns:

  • filename
  • isPet (Yes/No)
  • PetIDs (NCBITaxonomy codes of pet & farm animals present in document)
  • isAnimalInjury (Yes/No)
  • AnimalInjuryIDs (NCBITaxonomy codes of animals causing injuries present in document)
  • isFood (Yes/No)
  • FoodIDs (NCBITaxonomy codes of food mentions present in document)
  • isNosocomial (Yes/No)
  • NosocomialIDs (NCBITaxonomy codes of nosocomial species mentions present in document)
Figure 2. Example of submission file for LivingNER-Clinical Impact track

To evaluate your systems AFTER June 2022, do it in Codalab: 

The following sections are kept here for historical reasons. They are not relevant AFTER June 2022


Submission method

Submissions will be made via SFTP.

Submission tutorial

Submission instructions

5 submissions per sub-track will be allowed.

You must submit ONE SINGLE ZIP file with the following structure:

  • One subdirectory per subtask in which you are participating. 
  • In each subdirectory, you must include the results in a TSV file with the format defined in the “Submission format” section.
  • The TSV file must have the .tsv file extension and include ALL your predictions.
  • If you have more than one system, you can include their predictions and we will evaluate them (up to 5 prediction runs). Use separate files for each run, with numbers and a recognizable name. For example, 1-systemDL and 2-systemlookup.
  • In addition, in the parent directory, you must add a README.txt file with your contact details (team name, affiliation, and authors) and a really short explanation of your system.

Example submission ZIP file

Evaluation

Participants’ predictions are compared against the manual Gold Standard (generated by manual annotations of experts).

The primary evaluation metric for the LivingNER-Species NER and LivingNER-Species Norm sub-tracks consists of micro-averaged precision, recall, and F1 scores:

The used evaluation scripts together with proper documentation are freely available on GitHub to enable evaluation tools source code local testing by participating teams. Evaluation scripts.

More information on LivingNER – Clinical Impact track TBD

Schedule

UPDATED ON MAY, 3

StatusEventDate (all deadlines are 23:59 CEST)Link
Sample Set releaseMarch, 11Sample set
Train Set Release and guidelines publicationMarch, 18Training set &   Guidelines
Evaluation Library ReleaseMarch, 25Evaluation Library
Development Set ReleaseMarch, 31Validation set
Test and Background Set ReleaseApril, 22Test+Background set
Multilingual LivingNER corpusMay, 3Multilingual LivingNER corpus
End of evaluation period.
Predictions submission deadline
June, 12

UPDATED

Submission instructions
Evaluation deliveryJune, 17

UPDATED

TBD
Working Notes deadlineJune, 24

UPDATED

Easychair
Working Notes Corrections deadlineJuly, 4

UPDATED

TBD
Camera-ready submission deadlineJuly, 11

UPDATED

EasyChair
Workshop IberLEF @ SEPLN 2022September, 20 IberLEF