Registration

To participate in a task, register (for free) on Google Form.

Please, choose a team name you remember since we will use it throughout the whole competition!

Student registrants are required to provide the name and email address of a faculty team member who has agreed to serve as their advisor/mentor for developing their system and writing their system description (see below). By registering for a task, participants agree to run their system on the test data and upload at least one set of predictions to CodaLab. Teams may upload up to three sets of predictions per task. By receiving access to the annotated tweets, participants agree to Twitter’s Terms of Service and may not redistribute any portion of the data.

Annotation Guidelines

Training and validation (annotated), test and background (unannotated) datasets

Guidelines

The SMM4H-Spanish corpus was manually annotated by linguist experts following the SMM4H-Spanish guidelines. These guidelines contain rules for annotating professions, employment statuses and work-related activities in health-related tweets in Spanish. Additionally, they also include some considerations regarding the codification of the annotations to the ESCO and SNOMED-CT taxonomies.

Guidelines were created de novo in three phases:

  1. First, a zero version of the guidelines was developed after annotating a initial batch of ~200 tweets and outlining the main problems and difficulties of the data.
  2. Second, a stable version of guidelines was reached while annotating sample sets of the ProfNER corpus iteratively until quality control was satisfactory.
  3. Third, guidelines are iteratively refined as manual annotation continues.

The annotation guidelines are available in Spanish here and in English here.