Virginia Tech® home

Seminar: Identifying genomic signatures in pandemic viruses using large-language models

Sayantani Basu

PhD Candidate, Computer Science
University of Illinois Urbana-Champaign

Monday, February 12, 2024
11:00 AM - 12:00 PM
1100 Torgersen Hall

Abstract

The rise in spread of pandemic viruses like SARS-CoV-2 requires promptness in terms of mitigation measures as well as vaccine design. Genetic sequences provide us with a vast amount of knowledge related to the spike proteins and infection pathways when a virus mutates over time. Genomic large-language models (LLMs) enable us to classify and characterize mutations of COVID-19 sequences belonging to different variants. Genomic visualizations are also a powerful tool in this regard to help explain the influence of LLMs in such classifications. We expect that the performance on these tasks can also help in benchmarking which LLMs work better in classifying genomic sequences. Such visualizations can help in identifying genomic signatures and mutations in the sequences contributing to infectious variants. These models can serve as a basis to compare mutational landscapes of different viruses over time. My talk will highlight case studies on COVID-19 and influenza, as well as talk about how this approach can be potentially useful to identify trends of zoonotic emergence and infection over time.

 

Biography

Sayantani Basu is a PhD Candidate in Computer Science at the University of Illinois Urbana-Champaign, where she also earned her MS in Computer Science. Her research focuses on applying deep learning techniques and visualizations to genomic sequencing, especially in the area of pandemics and infectious diseases. She has co-authored multiple research papers in reputed journals and conferences.