Virginia Tech™home

Seminar: Stream Processing Systems for Emerging Trends

Liting Hu

Assistant Professor, Florida International University

Friday, January 22, 2021
9:30am - 10:30am
Zoom Only

Abstract:

Stream processing is proposed and popularized as a “technology like Hadoop but can give you results faster”, which lets users query a continuous data stream and quickly get results within a very short time period from the time of receiving the data. For that reason, stream processing technology has become a critical building block of many applications, such as making business decisions from marketing streams, identifying spam campaigns from social network streams, predicting tornados and storms from radar streams, and analyzing genomes in different labs and countries to track the sources of a potential epidemic. However, state-of-art solutions have dominantly centered around stateless stream processing, leaving another urgent trend—stateful stream processing—much less explored. A driving need is that the future stream applications need to store and update state along with their processing, and process live data streams in a timely  fashion from massive and geo-distributed data sets. Unfortunately, existing systems are mainly  designed for low-latency intra-datacenter settings. They do not scale well for running stream  applications that contain large distributed states in geo-distributed datacenters, suffering a  significantly centralized bottleneck and high latency.


In this talk, I will present a next-generation geo-distributed scalable stateful stream processing system. (1) At the architecture layer, I will introduce a decentralized “many masters/many workers”  architecture that revolutionary improves the scalability of stream processing systems. (2) At the  operator layer, I will present an in-memory data structure for storing state that minimizes the  memory overhead. (3) At the mechanism layer, I will present a fragment-based parallel recovery  mechanism that recovers large distributed states by leveraging distributed hash table (DHT) based overlay networks and erasure codes. (4) Finally, I will outline future research agenda on developing scalable stream processing systems for emerging trends.

Thus, I introduce developer recommendation choice architectures, a framework for creating  effective automated recommendations that nudge developers towards better behaviors and practices in their work. To provide evidence supporting this framework, I collected quantitative and qualitative data from experiments observing the behavior of computer science students, open  source software developers, and professional software engineers in industry to show that  incorporating developer recommendation choice architectures into automated recommendations  improves the productivity of developers and the overall quality of the software they create. As our  society becomes increasingly dependent upon technology, I aim to use this framework to continue  observing developer behavior and motivating the design of future tools for improving the  productivity, decision-making, and behavior of software engineers, thus enhancing user  experiences and increasing the quality of software we use in our daily lives.

Biography:

Dr. Liting Hu is an Assistant Professor of Computer Science in the School of Computing and Information Sciences at Florida International University (FIU). She received her Ph.D. in Computer Science from Georgia Institute of Technology in 2016 under the supervision of Dr. Karsten Schwan.  Her research interests span distributed systems, cloud and edge computing, distributed systems and system virtualization, with a focus on building scalable stream processing systems. She directs the Experimental and Virtualized Systems (ELVES) Research Lab, where she conducts experimental computer systems research. Examples include stream processing systems (with Spark Streaming, Storm, Flink), container as a service (with Docker and Kubernetes), identifying threats (e.g., fake  news, rumors, social bots) in online social networks, and resource management in large-scale data  centers (with Xen and KVM). She has served on numerous IEEE/ACM program committees and peer- reviewed more than a dozen journals. She interned at VMware, IBM Research, Microsoft Research  Asian, and Intel labs at CMU. Her research has been funded by the NSF, Department of Homeland  Security, and Cyber Florida. She was the recipient of an NSF SPX Award in 2019 and an NSF CAREER Award in 2020.