MESINESP: Medical Semantic Indexing in Spanish

The BioASQ MESINESP Task is sponsored by the Secretaría de Estado para el Avance Digital (SEAD) and the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).


Efficient access to medical literature is a pressing need not only for information published in English but also for articles published in other languages. Efficient retrieval of medical publications is key for evidence-based medicine, preparing systematic reviews or finding particular clinical case studies. Query expansion approaches relying on indexing with structured vocabularies is an efficient approach to facilitate more powerful literature search engines. Moreover indexing technologies with controlled vocabularies is also at the base of other clinically relevant technologies such as coding of electronic health records.

The critical importance of semantic indexing with medical vocabularies motivated several-shared tasks in the past, in particular the BioASQ tracks, with a considerable number of participants and impact in the field.

Currently, most of the Biomedical NLP and IR research is being done on English documents, and only few tasks have been carried out on non-English texts. Nonetheless, it is important to note that there is also a considerable amount of medically relevant content published in other languages than English and particularly clinical texts are entirely written in the native language of each country, with a few exceptions.

Spanish is a language spoken by more than 572 million people in the world today, either as a native, second or foreign language. It is the second language in the world by number of native speakers with more than 477 million people. According to results derived from WHO statistics, just in Spain there are over 180 thousand practicing physicians, more than 247 thousand nursing and midwifery personnel or 55 thousand pharmaceutical personnel.

These facts, and the extrapolation to other Spanish speaking countries, might explain why there is a large subset of medical content published in Spanish each year. Resources like PubMed do only contain a fraction of the biomedical and medical literature originally published in Spanish, which is also stored in other resources such as IBECS, SCIELO or LILACS.

Following the outline of previous medical indexing efforts, in particular the success of the BioASQ tracks centered on PubMed, we propose to carry of the first task on semantic indexing of Spanish medical texts.

MESINESP screenshots composition showing JSON object transformation for an article
JSON object transformation for an article.

Thus this task will address the automatic indexing with structured medical vocabularies (DeCS terms) of abstracts from the IBECS and LILACS databases written in Spanish. The main aim is to promote the development of semantic indexing tools of practical relevance of non-English content, determining the current-state-of-the art, identifying challenges and comparing the strategies and results to those published for English data.