The SocialDisNER corpus of the SMM4H 2022 track was manually annotated by medical experts following the SMM4H-SocialDisNER guidelines. These guidelines were adapted from previous versions used to annotate EHRs and medical literature (clinical case reports) and contain rules for annotating mentions of diseases in health-related tweets in Spanish. Additionally, they also include some considerations regarding the codification of the annotations to SNOMED-CT concept codes.
Guidelines were created de novo in three phases:
- First, an initial version of the guidelines was adapted from clinical annotation guidelines after annotating an initial batch of ~500 tweets and outlining the main problems and difficulties of the social media data.
- Second, a stable version of guidelines was reached while annotating sample sets of the SocialDisNER corpus iteratively until quality control was satisfactory.
- Third, guidelines are iteratively refined as manual annotation continues.
The annotation guidelines are available at Zenodo.