The MEDDOPROF corpus was manually annotated by linguist experts following annotation guidelines specifically create for this task. These guidelines contain rules for annotating professions, employment statuses and work-related activities (which were not included in this task) in clinical cases in Spanish. Additionally, they also include some considerations regarding the codification of the annotations to the ESCO and SNOMED-CT taxonomies.
Guidelines were created de novo in three phases:
- First, a zero version of the guidelines was developed after annotating a initial batch of ~200 clinical cases and outlining the main problems and difficulties of the data.
- Second, a stable version of guidelines was reached while annotating sample sets of the MEDDOPROF corpus iteratively until quality control was satisfactory.
- Third, guidelines are iteratively refined as manual annotation continues.
The annotation guidelines are available in Zenodo.