MultiCardioNER Data

MultiCardioNER uses multiple datasets. On the one hand, the DisTEMIST and DrugTEMIST corpora are a collection of 1,000 clinical cases in Spanish from different medical specialties (incl. oncology, otorhinolaryngology, dentistry, pediatrics, primary care, allergology, radiology, psychiatry, ophthalmology and more) annotated with disease and medication mentions, respectively. Both of them use the same text documents. On the other hand, a collection of cardiology clinical case reports (CardioCCC) annotated using the same guidelines is used to fine-tune and evaluate the systems.

  • For more information about the corpus content and format, check the Corpus Description page.
  • For more information about the annotation of the corpus, including corpus examples in Spanish and English, check the Annotation Guidelines page.
  • To download the corpus and some additional resources, check the Download page.