Virginia Tech® home

Seminar: Diagnose, Correct, Steer: Towards Functional and Trustworthy AI

Yuchen Yang

PhD Candidate
Department of Computer Science
Johns Hopkins University

Friday, February 14
9:30 - 10:30AM
1100 Torgersen Hall

 

Processed With Darkroom

Abstract

Generative AI empowers transformative applications, yet concerns about safety, such as the generation of not-safe-for-work (NSFW) content, and privacy risks remain challenges. In this talk, I will present a three-stage approach to address these concerns. First, diagnose AI systems to uncover vulnerabilities that attackers can exploit to trigger unsafe behaviors. Second, correct these behaviors through lightweight safety alignment or training with formal guarantees. Finally, steer AI behavior at test time using a training-free, rule-based reasoning framework to ensure compliance with desired guidelines.

This talk focuses on the diagnosing and correcting stages, showcasing how efficient jailbreak attacks can serve as automated tools to expose vulnerabilities in text-to-image models, particularly for NSFW content. Additionally, I will explore lightweight safety alignment methods that effectively mitigate NSFW outputs while preserving model functionality and creativity, along with a brief introduction to correcting with formal guarantees.

 

Biography

Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research focuses on delivering functional and trustworthy solutions for machine learning and AI systems. She has broad interests in security, privacy, and robustness, with a strong focus on enabling seamless real-world deployment of AI systems aligned with societal values.

Yuchen’s work has been published in tier-one security conferences, including IEEE S&P, ACM CCS, Usenix Security, and NDSS, as well as CV/NLP conferences such as ECCV and EMNLP. Her research has made a notable societal impact, with features in media outlets like MIT Technology Review, IEEE Spectrum, and being listed as one of the Normalized Top-100 Security Papers. She also serves as a PC member for multiple conferences and journals, including ACM CCS, ICLR, DSN, TDSC, etc. For more information, please visit her website: https://www.cs.jhu.edu/~yuchen413/