Schedule

StatusEventDate (UTC)Link
Training set releaseMar 31Zenodo
Development set releaseJun 14Zenodo
Additional set 1 - 85k tweets w/ disease mentions (Silver Standard)
Jun 27Zenodo
Validation predictions due [Practice Phase] [Required]Jul 4-
Additional set 2 - 85k tweets w/ additional mentions (Silver Standard)Jul 6Zenodo
Test set release (without annotations)Jul 11Zenodo
Test set predictions due [Evaluation Phase]Jul 15-
Test set evaluation scores releaseJul 25TBA
System descriptions dueAug 1TBA
Acceptance notificationAug. 15
TBA
Camera ready system descriptionsSep 1TBA
SMM4H workshop at Coling conference Oct 12-17COLING 2022

Scientific Committee

by socialdisner
  • Jey Han Hau, The University of Melbourne (Australia)
  • Luca Maria Aiello, IT University of Copenhagen (Denmark)
  • David Camacho, Universidad Politécnica de Madrid (Spain) 
  • Torsten Zesch, Fernuniversitat in Hagen (Germany)
  • Eiji ARAMAKI, Nara Institute of Science and Technology (Japan)
  • Rafael Valencia-Garcia, Universidad de Murcia (Spain)
  • Antonio Jimeno Yepes, RMIT University (Australia)
  • Carlos Gómez-Rodríguez, Universidad da Coruña (Spain)
  • Paloma Martínez, Universidad Carlos III de Madrid (Spain)
  • Anália Lourenço, Universidade de Vigo (Spain)
  • Eugenio Martinez Cámara, Universidad de Granada (Spain)
  • Gema Bello Orgaz, Universidad Politécnica de Madrid (Spain)
  • Juan Antonio Lossio-Ventura, National Institutes of Health (USA)
  • Héctor D. Menendez, King’s College London (UK)
  • Manuel Montes y Gómez, National Institute of Astrophysics, Optics and Electronics (Mexico)
  • Helena Gómez Adorno, Universidad Nacional Autónoma de México (Mexico)
  • Rodrigo Agerri, IXA Group (HiTZ Centre), University of Basque Country EHU (Spain)
  • Miguel A. Alonso, Universidad da Coruña (Spain)
  • Ferran Pla, Universidad Politécnica de Valencia (Spain)
  • Jose Alberto Benitez-Andrades, Universidad de Leon (Spain)
  • More TBA

Task Organizers

by socialdisner

SocialDisNER-ST is organized by:

  • Luis Gasco, Barcelona Supercomputing Center, Spain
  • Darryl Estrada, Barcelona Supercomputing Center, Spain
  • Eulàlia Farré-Maduell, Barcelona Supercomputing Center, Spain
  • Salvador Lima, Barcelona Supercomputing Center, Spain
  • Martin Krallinger, Barcelona Supercomputing Center, Spain

SocialDisNER-ST is part of Social Media Mining for Health Applications (#SMM4H) Shared Task 2022, which is organized by:

  • Graciela Gonzalez-Hernandez, University of Pennsylvania, USA
  • Davy Weissenbacher, University of Pennsylvania, USA
  • Arjun Magge, University of Pennsylvania, USA
  • Ari Z. Klein, University of Pennsylvania, USA
  • Ivan Flores, University of Pennsylvania, USA
  • Karen O’Connor, University of Pennsylvania, USA
  • Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland
  • Lucia Schmidt, Roche Pharmaceuticals, Switzerland
  • Juan M. Banda, Georgia State University, USA
  • Abeed Sarker, Emory University, USA
  • Yuting Guo, Emory University, USA
  • Elena Tutubalina, Kazan Federal University, Russia
  • Vera Davydova, Kazan Federal University, Russia

Description of the Corpus

by socialdisner

Training and validation (annotated), test and background (unannotated) datsets

Guidelines

The SMM4H-Spanish corpus is a collection of 10,000 health-related tweets in Spanish annotated with disease mentions by a medical expert following carefully designed annotation guidelines proven to be useful to annotate both literature (clinical case reports) as well as EHRs. The aim of the corpus is to extract a diversity of different disease mentions from social media to enable further characterizing health-related issues of practical importance.

The data of the corpus was obtained from a Twitter crawl focussing on selected accounts covering patient associations and organizations, healthcare institutions and professionals as well as their followers with the aim to enrich this social media content to retrieve healthcare relevant tweets . This crawl was further filtered to obtain only the tweets that were written in Spanish with particular emphasis (but not exclusive) to profiles located in Spain and some Spanish speaking countries.

The corpus was primarily annotated by medical experts in an iterative process that included the adaptation of medical document annotation guidelines specifically for this task. These guidelines will be publicly released together with the SocialDisNER corpus.

The annotation process was performed using the web-based tool brat. Below is an example of how the annotated tweets look like:

Sample annotation of the SocialDisNER SMM4H-Spanish corpus.

All in all, 10,000 tweets were annotated. They were split into 60% training (6,000), 20% development (2,000) and 20% test (2,000). The different splits will be released according to the track schedule and accesible on zenodo.

FORMAT

SocialDIsNER: Tweet disease mention detection. Annotations are stored in a tab-separated file with 5 columns:

tweet_id begin end type extraction

Datasets

by socialdisner

Train set

The train set contains 5,000 annotated tweets. Will be published on zenodo.

Validation set

The validation set contains 2500 annotated tweets. Will be published on zenodo.

Test and background sets

The test set contains 2500 tweets. The background set contains 50K tweets. Will be published on zenodo.

The test and background set will be published together. You will have to submit predictions for the whole set, but you will only be evaluated with the test set `predictions.

Test set with Gold Standard annotations

The Gold Standard annotations of the test set will be released after the submission deadline

Corpora Stats.

 TrainingDevelopment
# Tweets50002500
# characters1253431516768
# tokens21155584478
Avg. char. /tweet250.69206.71
Avg. Tok. /tweet42.3133.79
# disease mentions151734252
# unique disease mentions44071413

Publications

by socialdisner

SocialDisNER’s overview paper:

Luis Gasco Sánchez, Darryl Estrada Zavala, Eulàlia Farré-Maduell, Salvador Lima-López, Antonio Miranda-Escalada, and Martin Krallinger. 2022. The SocialDisNER shared task on detection of disease mentions in health-relevant content from social media: methods, evaluation, guidelines and corpora. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 182–189, Gyeongju, Republic of Korea. Association for Computational Linguistics.

URL: https://aclanthology.org/2022.smm4h-1.48/

SMM4H 2022 overview paper:

Davy Weissenbacher, Juan Banda, Vera Davydova, Darryl Estrada Zavala, Luis Gasco Sánchez, Yao Ge, Yuting Guo, Ari Klein, Martin Krallinger, Mathias Leddin, Arjun Magge, Raul Rodriguez-Esteban, Abeed Sarker, Lucia Schmidt, Elena Tutubalina, and Graciela Gonzalez-Hernandez. 2022. Overview of the Seventh Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2022. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 221–241, Gyeongju, Republic of Korea. Association for Computational Linguistics.

URL: https://aclanthology.org/2022.smm4h-1.54/

Participants papers:

Workshop

by socialdisner

SocialDISNER will be part of the Social Media Mining for Health 2022 (#SMM4H) workshop at the COLING 2022 (the 29th International Conference On Computational Linguistics), that takes place in October at Gyeongju (Republic of Korea).

COLING is one of the leading conferences on natural language processing and computational linguistics and attracts participants from both top research centers and emerging countries.

SocialDISNER participants are required to write a short-paper describing the system(s) they ran on the test data. Some sample description systems can be found on pages 89-136 of the #SMM4H 2019 proceedings. Accepted system descriptions will be included in the #SMM4H 2022 proceedings.

We encourage at least one author of each accepted system description to register for the #SMM4H 2022 Workshop, co-located at COLING, and present their system as a poster. Selected participants, as determined by the program committee, will be invited to extend their system description to up to four pages, plus unlimited references, and present their system orally.

Contact & FAQ

by socialdisner

Email Martin Krallinger to Krallinger.Martin@gmail.com , Luis Gasco to luis.gasco@bsc.es , and Darryl Estrada to darryl.estrada@bsc.es


  1. Q: What is the goal of the shared task?
    The goal is to predict the named entities of the tweets in the test and background sets.

  2. Q: How do I register?
    Here: Google Form

  3. Q: How do I submit the results?
    In CodaLab.

  4. Q: Can I use additional training data to improve model performance?
    Yes
    , participants may use any additional training data they have available, as long as they describe it in the system description.


  5. Q: Is there a Google Group for the SocialDisNER task?
    Yes: Google Group