Virginia Tech®home

Seminar: High-Performance and Cost-Effective Storage Systems for Supporting Big Data

Zhichao Cao

Research Scientist, Facebook

Monday, March 1, 2021
12:00pm - 1:30pm
Zoom Only



With the widely usage of social media, e-business, smartphones, and smart home kits, data is generated everywhere, which constructs a new world. The world is all about data, we are in the big data era. Storage systems act as the keystone to ensure data persistency in today’s big data infrastructure. Due to the explosion of data scale, achieving a better tradeoff between performance and cost-effectiveness is one of the main challenges in designing and optimizing storage systems.

In this talk, I will present my research that addresses the tradeoff challenges in primary storage, backup systems, and storage systems for AI/ML. I will primarily discuss the key-value store workload characterization in primary storage and AI/ML systems at Facebook. The key findings motivate two follow up studies in distributed key-value store that 1) explore a way of using NVM as a cache tier to better balance memory cost and overall performance, and 2) address the performance penalty in disaggregated infrastructure by introducing the in-storage computing architecture. I will also cover the research of data deduplication read performance improvement in backup systems for big data applications, which will include a hybrid look-ahead caching scheme and a data rewrite scheme. Finally, the vision of my future research on storage systems for new storage devices, AI/ML systems, and new infrastructure is introduced..


Dr. Zhichao Cao is a research scientist at Facebook, mainly working on data infrastructure, storage systems, and databases. He finished his bachelor’s degree in Automation from Tsinghua University in China and completed his Ph.D. degree in computer science from the University of Minnesota in 2020, supervised by Prof. David H.C. Du. Zhichao’s research is on designing and optimizing high-performance and cost-effective storage systems for big data. Specifically, he works on tiered file systems, key-value stores, secondary storage systems, systems for new storage devices, and storage systems for AI/ML platforms. Zhichao has published more than 15 papers in major conferences and journals, including USENIX FAST, USENIX HotStorage, IEEE MASCOTS, Computer in Industry, IEEE Transaction on Computers, and ACM Transaction on Storage. He also has rich industry experience through multiple research internships and research collaborations at leading companies such as NetApp, HPE, Veritas, and Facebook. You can find out more about Zhichao at