Seminar: Revisiting the Role of Visual Media in Understanding a Rich Multimodal World
Postdoctoral Researcher, Columbia University
Monday, February 28, 2022
1100 Torgersen Hall
Modern media often weaves modalities such as text, images, video, and audio together to create engaging multimedia content. Authors of such content commonly use each modality to tell different parts of the story, with each modality contributing to the overall message. Most computer vision research, however, views visual media through the lens of understanding its literal content (e.g. listing objects in an image) and assumes a straightforward relationship exists between visual content and text (e.g. a caption describing the content of an image). In this talk, I will argue that we should instead approach images as communicative tools used by humans to tell visual stories and to convey messages. This perspective requires more deeply understanding the meaning of visual content within the larger context of culture, commonsense, and background knowledge as well as revisiting common assumptions within existing datasets and tasks.
To demonstrate this theme, I will first present methods which model the latent messaging within visual content. Specifically, I will show how photos can be attributed to professional photographers by understanding subtle semantic clues and how the political bias of mass media organizations is expressed through the images they choose to illustrate their stories. Next, I will discuss the rich, complementary relationship between images and text exhibited by real-world multimedia and present methods for learning robust semantic representations. Finally, I will show two applications of my research for detecting disinformation which both rely on modeling the complex relationship between visual media and text in a fine-grained way.
Chris Thomas is a postdoctoral researcher at Columbia University working with Professor Shih-Fu Chang. He currently represents Columbia as part of the DARPA Semantic Forensics project focused on detecting, attributing, and characterizing disinformation. His research lies at the intersection of computer vision, natural language processing, and machine learning. He received his PhD in Computer Science from the University of Pittsburgh in 2020, where he was advised by Professor Adriana Kovashka. His work has appeared in top conferences and journals, including CVPR, NeurIPS, ECCV, and IJCV.