SympTEMIST Data

The SympTEMIST corpus is a collection of 1,000 clinical cases in Spanish from different medical specialties such as cardiology, oncology, otorhinolaryngology, dentistry, pediatrics, primary care, allergology, radiology, psychiatry, ophthalmology, and urology annotated with symptoms, signs and findings. Every mention in the corpus has been standardized using SNOMED CT terminology.

  • For more information about the corpus content and format, check the Corpus Description page.
  • For more information about the annotation and normalization of symptoms, signs and findings, including corpus examples in Spanish and English, check the Annotation Guidelines page.
  • To download the corpus and some additional resources, check the Download page.