Supplementary resources for training
- PubMed abstracts translated into Spanish (to be published). We are translating all PubMed English abstracts into Spanish. This ‘Spanish’ version will include the associated DeCs codes as generated from the original MeSH descriptors.
Linguistic Resources
- AbreMES-DB: The Spanish Medical Abbreviation DataBase. Abbreviations are extracted from the metadata of different biomedical publications written in Spanish, which contain the titles and abstracts. Download from ZENODO.
- MEDDOCAN-Gazetteer: Gazetteer of MEDDOCAN related entities. Includes names, surnames, addresses, hospitals, professions, and different types of locations (provinces, cities, towns, etc.). Download it from here.
- Sentence-splitted test-set : Sentence splitted test set (including background set), computed using SPACCC_POS-TAGGER (see below). These annotations are mandatory to compute the leak score of subtrack 1. Download it from here.
- SPACCC_POS-TAGGER: Part-of-Speech Tagger for medical domain corpus in Spanish based on FreeLing. Download it from GitHub.
DeCS Resources
- DeCS descriptors 2019 (table with DeCS codes plus the descriptors & synonyms from both European and Latin Spanish DeCs data sets, separated by pipes).