ProfNER-ST: Identification of professions & occupations in Health-related Social Media
Submit your test & background set predictions [Evaluation Phase] by 11:59 pm UTC on March 4th: https://competitions.codalab.org/competitions/28766
|Training & Development set release||Dec 15||Train, dev, test and background sets|
|Validation predictions due [Practice Phase] [Required]||Feb 25||Codalab|
|Test set release (without annotations)||Mar 1||Train, dev, test and background sets|
|Test set predictions due [Evaluation Phase]||Mar 4||Codalab|
|Test set evaluation scores release||Mar 8 (tentative)||TBD|
|System descriptions due||Mar 15||TBD|
|Acceptance notification||Apr 1||TBD|
|Camera ready system descriptions||Apr 12||TBD|
About the task
Identification of professions and occupations (ProfNER) in Spanish. This task will focus on the recognition of professions and occupations from Twitter using data in Spanish after selecting health-relevant content. The aim is to extract professions from social media to enable characterizing health-related issues, in particular in the context of COVID-19 epidemiology as well as mental health conditions.
As for the automatic recognition of professions, we should highlight that some workers are at the forefront of the battle against the COVID-19 pandemic. Detecting vulnerable occupations, be it due to their risk of direct exposure to the virus or due to mental health issues associated with work-related aspects is critical to prepare preventive measures. In case of direct exposures and COVID-19 deaths, data from the UK Office for National Statistics point out that it is important to characterize such at-risk groups, which included not only healthcare workers but also professions such as caregivers, taxi drivers, security guards or retail assistants. The ProfNER shared task will enable training deep learning named entity recognition approaches.
The Social Media Mining for Health Applications (#SMM4H) Shared Task 2021 invites researchers to develop systems to solve health informatics challenges for social media. The seventh track of the task focuses on the identification of professions and occupations in Spanish tweets. Previous versions of the SMM4H have included a similar task on English tweets and this year, the dataset includes sets of tweets in English, Spanish and Russian languages. This webpage is devoted to the Spanish part of this multilingual track (i.e. identification of professions and occupations in Spanish tweets).
There are 2 Spanish sub-tracks:
- Track A – Tweet binary classification. Participants must determine whether a tweet contains a mention of occupation, or not.
- Track B – NER offset detection and classification. Participants must find the beginning and end of occupation mentions and classify them in the corresponding category. The corpus contains 4 mention categories, but participants will only be evaluated in the prediction of 2 of them: PROFESION [profession] and SITUACION_LABORAL [working status].
The SMM4H 2021 general webpage can be accessed here.
#SMM4H is held as part of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.