Virginia Tech®home

Seminar: Automated Scientific Knowledge Extraction from Massive Text Data

Xuan Wang

University of Illinois at Urbana-Champaign

Tuesday, April 19, 2022
10:00 - 10:50 AM
1100 Torgersen Hall

Abstract

Text mining is promising for advancing human knowledge in many fields, given the rapidly growing volume of text data (e.g., scientific articles, medical notes, and news reports) we are seeing nowadays.  In this talk, I will present my work on automatically extracting scientific knowledge from massive text data to enable and accelerate scientific discovery.  First, I will talk about my work on information extraction with minimum human supervision.  With the growing volume of text data and the breadth of information, it is inefficient or nearly impossible for humans to manually find, integrate, and digest useful information.  To address the above challenge, I have developed methods that automatically extract fine-grained information from massive text data with weak supervision from knowledge bases, taxonomies, and pre-trained language models. Second, I will talk about my work on literature-based scientific knowledge discovery. I have developed methods to address open scientific problems, such as supporting clinical drug discovery and assisting chemical and biological molecule design, based on the rich information we automatically extracted from scientific text. Last, I conclude my talk with future directions on text mining for knowledge discovery, such as integrating domain-specific symbolic knowledge with text representation learning, and intersections of text mining with other domains such as graph mining, computer vision, and human-computer interaction.

Biography

Xuan Wang is a fifith-year Ph.D. student in the Computer Science Department at the University of Illinois at Urbana-Champaign (UIUC).  She is working in the Data Mining Group under the supervision of Prof. Jiawei Han. Xuan received her M.S. in Statistics (2017) and M.S. in Biochemistry (2015) from UIUC.  She received her B.S. in Biological Science (2013) from Tsinghua University, China.  Her research interests are in text mining and natural language processing, emphasizing applications to biological and health sciences. Her current research theme is developing effective and scalable algorithms and systems for automatically understanding massive text data to enable and accelerate scientific discovery. Xuan has published about 20 research/demo papers in top NLP conferences (e.g., ACL and EMNLP) and biomedical informatics journals (e.g., Bioinformatics) and conferences (e.g., ACM-BCB and IEEE-BIBM). She is the recipient of the YEE Fellowship Award for 2020-2021 from UIUC.