Virginia Tech® home

Graduate Seminar: Harnessing Rare Category Trinity for Complex Data

Harnessing Rare Category Trinity for Complex Data

Dr. Dawei Zhou
Friday, September 23, 2022
2150 Torgersen Hall


In the era of big data, we are inundated with the sheer volume of data being collected from various domains. In contrast, it is often the rare occurrences that are crucially important to many high-impact domains, ranging from financial fraud detection to rare disease diagnosis, from national security to scientific discovery. However, modern AI systems typically require the availability of rich annotated data and often achieve suboptimal performance in rare category analysis due to the label scarcity nature of rare examples. In this talk, I will discuss my recent work - a generic learning mechanism with trinity modules for complex rare category analysis: (M1) Rare Category Characterization - characterizing the rare patterns with a compact representation;  (M2) Rare Category Explanation - interpreting the prediction results and providing relevant clues for the end-users; (M3) Rare Category Generation - producing synthetic rare category examples that resemble the real ones. The key philosophy of our mechanism lies in "all for one and one for all" - each module makes unique contributions to the whole mechanism and thus receives support from its companions. I will hinge on key application domains, discuss our proposed techniques and theoretical results for characterizing, comprehending, and synthesizing rare examples, and showcase a unified visual analytic system for rare category analysis in the dynamic environment. Finally, I will conclude this talk and share thoughts about my future research.



Dawei Zhou is an Assistant Professor in the Department of Computer Science at Virginia Tech. His research interests lie in rare category analysis, graph mining, curriculum learning, and algorithmic fairness, with applications in financial fraud detection, financial forecasting, social media analysis, and healthcare. He has authored more than 20 publications in premier academic venues across AI, data mining, and information retrieval (e.g., AAAI, IJCAI, KDD, ICDM, SDM, TKDD, DMKD, WWW, CIKM) and has received student travel awards at KDD, WWW, AAAI, IJCAI, ICDM, etc. His work on complex rare category analysis has been selected by Computing Research Association (CRA) to showcase at the 24th CNSF Capitol Hill Science Exhibition. He has broad collaborations within industry and academia, such as IBM T.J. Watson Research Lab, HRL Laboratories, Alibaba DAMO Academy, Early Warning Inc., UIUC, University of Rochester, Rutgers University, etc.