Motivation

The extraction of clinical variables from medical content is key to enable healthcare data analytics. Due to the highly specialized medical language, with considerable variation depending on the medical specialty, more specialized automatic semantic annotation resources are needed, not only for English but also other languages. This is particularly true for clinical content related to the cardiovascular diseases (CVDs), which represent the leading cause of death globally, responsible for approximately 17.9 million death/year.

Previous efforts to recognize clinical concepts, for instance in Spanish, have focused typically only on a single or limited number of entity types, using a general collection medical document, or focusing on clinical content written in a single language. This resulted in valuable datasets and resources, such as the DISTEMIST, SYMPTEMIST, PharmaCoNER,  and Medprocner corpora and systems, but (a) the interplay and complementarity of multi-label entity extraction approaches were not targeted and evaluated nor (b) how such approaches could be adapted to handle multiple languages was tested.

To address all these issues the novel task MultiCardioNER will focus on the automatic recognition of two key clinical variables or concept types, namely diseases and medications.

The task MultiCardioNER will focus on the recognition of these clinical entity types in cardiology clinical case documents with the following two aims:

  1. Adaptation of general clinical concept recognition systems to cardiology case reports to assess and determine how well such systems can be adapted to high impact clinical application domains / specialties (cardiology disease NER- CardioDis subtrack: Spanish).
  2. Promote the comparative assessment and development of clinical entity recognition systems for multiple languages (i.e., medication mention detection) as well as adaptation to specific medical specialties (MultiDrug subtrack: English, Spanish and Italian)

To enable the adaptation of general medical NER systems for diseases and medications the MultiCardioNER task will rely on a training collection of 1000 general clinical case reports in Spanish annotated with diseases (Spanish) and medications (English, Spanish and Italian).

Moreover, o be able to adapt such general medical NER approaches to cardiology case reports a development set of 250 cardiology cases will be released.

The test set will consist of an additional test collection of 250 cardiology case reports (which will be released together with an additional collection or background set of clinical case reports to promote the generation of a silver Standard corpus and to make sure participating systems are able to scale up to larger content collections).

Both clinical entity mention types were manually revised by medical domain experts.

The evaluation of systems for this task will use flat evaluation, mainly micro-averaged Precision, Recall and F-measure (MiF).