Firas Abuzaid
Photo credits: Hector Garcia-Molina

Hi, I’m Firas Abuzaid; thanks for visiting my website. I’m a 4th-year Ph.D. student in Computer Science at Stanford University, co-advised by Matei Zaharia and Peter Bailis.

As a member of the FutureData Systems Group and the Stanford DAWN Project, I focus on the intersection of systems and machine learning: how to take high-level machine learning tasks and build software systems to improve the efficiency of these tasks without sacrificing their accuracy.

I spent the first year of my Ph.D. at MIT CSAIL, working under Matei Zaharia and Sam Madden as part of the MIT DB group.

Prior to MIT, I was at Stanford, where I completed my B.S. and M.S. in Computer Science. I worked with Chris Ré on database systems and machine learning systems.

You can check out my CV here.

Latest Work

MacroBase SQL

DIFF, a new SQL operator, provides a generalizable interface for finding explanations in large-scale datasets. Our implementation of DIFF in MacroBase SQL (a fork of MacroBase) outperforms other state-of-the-art explanation engines by up to an order of magnitude.

F. Abuzaid, P. Kraft, S. Suri, E. Gan, E. Xu, A. Shenoy, A. Ananthanarayan, J. Sheu, E. Meijer, X. Wu, J. Naughton, P. Bailis, and M. Zaharia. DIFF: A Relational Interface for Large-Scale Data Explanation, To appear at VLDB 2019.



MacroBase is a new a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.

F. Abuzaid, P. Bailis, J. Ding, E. Gan, S. Madden, D. Narayanan, K. Rong, S. Suri (alphabetical). MacroBase: Prioritizing Attention in Fast Data, ACM Transactions on Database Systems (TODS) - Best of SIGMOND 2017 Papers.

[Website] [Code]


22x speed-ups for parsing JSON, Avro, and Parquet data.

S. Palkar, F. Abuzaid, P. Bailis, and M. Zaharia. Filter Before You Parse: Faster Analytics on Raw Data with Sparser, VLDB 2018.

[Code] [Blog Post]


100x speedups for CNN evaluation on video streams.

D. Kang, J. Emmons, F. Abuzaid, P. Bailis, and M. Zaharia. NoScope: Optimizing Neural Network Queries over Video at Scale, VLDB 2017.

[Code] [Blog Post]


A project on training deep decision trees at scale. Compatible with Spark MLlib 1.6+.

F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, and A. Talwalkar. Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale, NIPS 2016.

[Code] [Spark Package] [Slides from ML Systems Workshop, NIPS 2016] [Talk from Spark Summit 2016]

Contact Me

Feel free to shoot me an email at; you can also hit me up on Twitter. Or, if you're on campus, stop by my office in Gates 432!