Annotation Guidelines
This page gives an overview of the annotation and normalization scheme and process of the MedProcNER/ProcTEMIST corpus. More detailed information is available in the Annotation Guidelines, a 30+ pages long file that documents the corpus’s creation and annotation process. They are available on Zenodo.
The MedProcNER/ProcTEMIST guidelines were created by clinical experts at the same time as the DisTEMIST guidelines. After their definition, the guidelines were refined in several cycles of quality control and annotation consistency analysis before annotating the entire dataset, with a final agreement of… Additionally, once the manual annotation phase was finished, the corpus was thoroughly revised in a post-processing step to maximize consistency.
This page has two parts:
Annotation
The corpus includes only one label: PROCEDIMIENTO (clinical procedure). Despite this unicity, the label and corpus itself are very varied. Several kinds of procedures are annotated, including diagnostic, therapeutic, preventive and supportive procedures. Some specific examples are:
- Simple medical exploration and inspection methods (that require little or no instrumentation): These procedures involve the use of basic diagnostic techniques to examine a patient’s body for signs of illness or disease. Examples include listening to the lungs with a stethoscope (“auscultación pulmonar”, pulmonary auscultation), feeling the abdomen for abnormalities (“palpación abdominal”, abdominal palpation), or checking the patient’s neurological responses (“exploración neurológica”, neurological examination).
- Imaging tests: These procedures involve the use of advanced medical technology to produce images of the inside of the body, which can be used to diagnose and monitor various conditions. Examples include magnetic resonance imaging (MRI) of the brain (“RMN cerebral”), computed tomography (CT) of the chest with contrast (“TAC torácico con contraste”), or x-rays of the femur from the anterior-posterior (AP) view (“RX de fémur AP”).
- Other medical tests: These procedures involve the use of laboratory tests or other diagnostic tools to evaluate a patient’s health status or monitor their condition. Examples include a complete blood count (“hemograma”, hemogram), electrocardiogram (“electrocardiograma” or “ECG”) to measure heart function, or electroencephalogram (“electroencefalograma” or “EEG”) to measure brain activity.
- Administration of medications: These procedures involve the delivery of medications to treat or manage a patient’s medical condition. Examples include antibiotic therapy to treat bacterial infections (“antibioterapia”) or corticosteroids (“corticosteroides”) to reduce inflammation.
- Administration of blood, plasma, serums, bolus and continuous medication pumps: These procedures involve the delivery of fluids, nutrients, or medications directly into a patient’s bloodstream. Examples include a blood transfusion to replace lost blood (“transfusión de 2 concentrados de hematíes”) or fluid therapy to treat dehydration (“sueroterapia”).
- Simplified surgical treatments: These procedures involve minimally invasive or straightforward surgical procedures that can be performed relatively quickly and easily. Examples include removal of the prostate gland through an incision in the lower abdomen (“adenomectomía retropúbica”, retropubic adenomectomy) or placement of a testicular prosthesis (“se coloca prótesis testicular”).
- Surgical descriptions: These procedures involve detailed accounts of surgical procedures, including the steps involved, the instruments used, and any complications that may arise. Examples include “reconstructed with a chin graft and an arched titanium plate” (“se reconstruyó con injerto de mentón y placa de titanio arqueada”) or “the intercortical gaps were filled with cancellous bone obtained from the donor area” (“los gaps intercorticales se rellenaron de hueso esponjoso obtenido de la zona donante”).
As with many other clinical entities, detecting and annotating procedures in structured text can be quite complicated due to the use of descriptive language, abbreviations, multiple parts (i.e. anatomical entities or instruments) and even ambiguous wording.
Normalization
All entities in the corpus were normalized to SNOMED CT concepts.