Call for Papers 2nd workshop on MultilingualBIO: Multilingual Biomedical Text Processing

LREC 2020, Marseille (France), Saturday, May 16 2020 (afternoon)


As in other NLP areas, we are currently witnessing fast developments, with improved access, analysis and integration of healthcare-relevant information from heterogeneous content types, including electronic health records, medical literature, clinical trials, medical agency reports or patient-reported information available form social media and forums. There is an increasing automation of tasks in many critical areas, such as detecting interactions or supporting clinical decision. However, progress is very uneven depending on the language. Main achievements in processing biomedical text are almost restricted to English, with most other languages lagging behind in this respect, due to lack of annotated resources, incomplete vocabularies and insufficient in-domain corpora. More effort from the research community is needed to endow these languages with the necessary resources.

Also, machine translation in the biomedical domain is an important field of application. The need to translate biomedical texts occurs in many situations. Increasing cross-border mobility may require specific translation of medical records and discharge reports. In addition, internationalization of the pharmaceutical industry demands that technical specifications and package leaflets of medicines be translated to the language of the customer in several countries. Other areas of interest are translation of medical patents, laboratory reports, clinical trials or scientific publications. 

The second edition of MultilingualBIO, at the LREC 2020 Conference, is a unique opportunity to promote the development of biomedical text processing resources and components in languages beyond English, exploring the use of novel methodological advances, e.g. transfer-learning techniques such as contextual embeddings, for a diversity of tasks in the domain, including machine translation.

In this workshop, we plan to address issues, such as the following (but not restricted to):

  • Building of MT systems adapted to the biomedical domain.
  • Production of multilingual corpora in the biomedical domain.
  • Creation (and translation) of multilingual biomedical glossaries, ontologies and terminological resources 
  • Application of transfer-learning techniques across tasks in the biomedical domain, such as contextual embeddings.
  • Extension of the coverage of the normative terminologies to languages other than English (e.g. ontologies from the Open Biomedical Ontology repository like HPO, LOINC, MEDRA, UMLS, SNOMED-CT, RxNorm etc).
  • Dealing with localization issues, including adaptation to local varieties of international languages (UK vs USA English, Spanish from Spain and Latin America or USA, etc.).
  • NLP and text mining applied to health, biomedicine and related domains, including also food safety.
  • Medical named entity recognition and grounding systems beyond English.