Virginia Tech®home

Seminar: PlinyCompute: Connecting Programming, Computation, and Storage for Big Data Analytics


Jia Zou

Research Scientist, Rice University

Monday, Febraury 25
10:00am - 11:15am
310 Kelly Hall


Users want Big Data analytics systems that provide interactive-speed ad-hoc query processing and short training times for machine learning. But the performance of existing systems is not always great. In this talk, I identify two reasons for this. First, such systems are heavily layered, with many separate softwares working together: a distributed file system, an in-memory file system, the JVM, and the computational system itself. Communication across layers leads to inefficiencies. Also, it is difficult to automatically optimize computations residing in opaque user codes, such as user defined functions (UDFs).

In this talk, I will describe my work aimed at solving those problems. First, I will present a novel declarative programming interface, based on lambda calculus, that forces programmers to expose intent and compiles into a standalone, intermediate representation of computations that facilitates relational-style query optimization and automatic data placement. Second, I will describe a novel storage system that avoids the layering overhead by pushing down analytics computations and managing all analytics data in disk and memory in a monolithic distributed system. In the end, I will describe my on-going work and future research plan for building a novel D3 big data analytics platform to provide Declarative programming, Deterministic performance for a set of Dynamic workloads such as Blockchain applications.


Jia Zou is a Research Scientist in the Department of Computer Science at Rice University. Prior to join Rice in 2015, She worked in IBM Research - China as a Research Staff Member. She received her Ph.D degree from Tsinghua University,  China. Her research investigates and builds high performance and scalable systems for Big Data management and  analytics, which has led to an open source system called PlinyCompute and publications in top Big Data management venues, including VLDB and SIGMOD. She mentors undergraduate students, graduate students and high school student for their research works. She also has served the TPC member of Cluster 2018 and has reviewed more than 40 papers for IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Knowledge and Data Engineering (TKDE) and so on."