MESINESP: Medical Semantic Indexing in Spanish (BioASQ 2020)

Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).

Generated Resources

Please, cite Carlos Rodriguez-Penagos, Anastasios Nentidis, Aitor Gonzalez-Agirre, Alejandro Asensio, Jordi Armengol-Estapé, Anastasia Krithara, Marta Villegas, Georgios Paliouras, Martin Krallinger: Overview of MESINESP8, a Spanish Medical Semantic Indexing Task within BioASQ 2020. In: CLEF (Working Notes) (2020)

Schedule

Schedule updated: Check NEW TEST SET RELEASE DATES!

DateEvent
November 25, 2019Training set released. Please, register at the BioASQ webpage.
January 17, 2020Release of additional dataset PubMed abstracts translated into Spanish.
April 4, 2020Development set released.
May 11, 2020Test set released (includes background set). IMPORTANT - UPDATED!
June 1, 2020End of evaluation period (system submissions). IMPORTANT - UPDATED!
June 2, 2020Test set with GS annotations released.
June 10, 2020Working notes paper submission.
June 14, 2020Notification of acceptance (peer-reviews).
June 28 2020Camera ready paper submission.
September 22-25, 2020The 8th BioASQ Workshop will be held as a Lab in CLEF 2020, on Septempber 22-25, 2020, in Thessaloniki, Greece.

The BioASQ MESINESP Task is sponsored by the Secretaría de Estado para el Avance Digital (SEAD) and the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Motivation

There is a pressing need to improve the access to information comprised in health and biomedicine related documents, not only by professional medical users buy also by researches, public healthcare decision makes and particularly by patients.

Among the variety of health related data (e.g. patents, blogs and social media, electronic health records, clinical trial repositories or drug labels), the medical literature plays a central role for evidence-based medicine, preparing systematic reviews or finding particular clinical case studies. Improved search engines, for instance through query expansion approaches, typically rely on previous manual indexing of records with structured vocabularies to facilitate more powerful literature search engines. Manual indexing is currently highly time consuming, costly and laborious.

Moreover semantic indexing technologies with structured vocabularies can also result in improved search engines for other documents or improve clinical coding of electronic health records.

There is an increasing demand to generate better retrieval strategies to medical literature beyond English, in order to recover important medical findings including clinical case studies published in other languages.

The critical importance of semantic indexing with medical vocabularies motivated several-shared tasks in the past, in particular the BioASQ tracks, with a considerable number of participants and impact in the field for medical literature in English.

Currently, most of the Biomedical NLP and IR research is being done on English documents, and only few tasks have been carried out on non-English texts. Nonetheless, it is important to note that there is also a considerable amount of medically relevant content published in other languages than English and particularly clinical texts are entirely written in the native language of each country, with a few exceptions.

Spanish is a language spoken by more than 572 million people in the world today, either as a native, second or foreign language. It is the second language in the world by number of native speakers with more than 477 million people. According to results derived from WHO statistics, just in Spain there are over 180 thousand practicing physicians, more than 247 thousand nursing and midwifery personnel or 55 thousand pharmaceutical personnel.

These facts, and the extrapolation to other Spanish speaking countries, might explain why there is a large subset of medical content published in Spanish each year. Resources like PubMed do only contain a fraction of the biomedical and medical literature originally published in Spanish, which is also stored in other resources such as IBECS, SCIELO or LILACS.

Following the outline of previous medical indexing efforts, in particular the success of the BioASQ tracks centered on PubMed, we propose to carry of the first task on semantic indexing of Spanish medical texts.

Example medical abstract and JSON object transformation sample for an article. .

Thus this task will address the automatic indexing with structured medical vocabularies (DeCS terms) of abstracts using two highly used databases (IBECS and LILACS) providing content in Spanish. The main aim of MESINESP is to promote the development of semantic indexing tools of practical relevance of non-English content, determining the current-state-of-the art, identifying challenges and comparing the strategies and results to those published for English data.