Virginia Tech™home

Seminar: Towards Robust Multilingual NLP

Junjie Hu

PhD Candidate, Carnegie Mellon University

Monday, March 15, 2021
9:30am - 11:00am
Zoom Only

Junjie Hu


Over the last decade, the phenomenal success of NLP systems has been mostly driven by deep neural networks and supervised machine learning approaches on a large amount of labeled data. However, it’s infeasible to annotate data under all possible real-world scenarios. As a result, these systems may fail dramatically in practice when dealing with complex textual data written in different languages, coming from diverse domains, or even associated with different data modalities. 

In this talk, I will present work on two distinct aspects that are important to extend the generalization ability of NLP systems. First, I will present my work on XTREME that provides a platform for cross-lingual learning on 9 NLP tasks over 40 languages. I will then introduce a training technique for learning multilingual representations for words and sentences. Secondly, I will present my work on adapting models to text from new topical domains using unannotated data, and demonstrate the improvement of neural machine translation over 5 topical domains. Finally, I will envision future directions such as multimodal machine learning. I will conclude this talk by providing an overview of my research and my research plans in the interdisciplinary field of AI, data science, and healthcare.


Junjie Hu is a Ph.D. candidate in Language Technologies Institute, School of Computer Science at Carnegie Mellon University (CMU), working with Prof. Jaime Carbonell and Prof. Graham Neubig. His research lies at the intersection of natural language processing and machine learning. He interned at Google AI and Microsoft Research. He developed the XTREME platform for cross-lingual learning methods, which has been used by both industry and academic institutes such as NYU, Facebook, Microsoft, and Alibaba. His research has been published in premier venues such as ACL, EMNLP, NAACL, ICML, and AAAI, and attracted media attention in outlets such as Slator, Google AI blog, and Facebook AI blog. He is the recipient of an IBM Outstanding Student Scholarship, two National Scholarships in China, and a Best Demo Paper Nomination at NAACL 2019.