Firas
Abuzaid
Firas Abuzaid
Photo credits: Hector Garcia-Molina

Hi, I’m Firas Abuzaid; thanks for visiting my website! I am co-founder and CTO of Neo.Tax, a startup that automates taxes for small businesses.

I graduated from Stanford University in August 2022 with a Ph.D. in Computer Science, where I was co-advised by Matei Zaharia and Peter Bailis. You can find my dissertation here.

At Stanford, I was a member of the FutureData Systems Group and the Stanford DAWN Project. I spent the first year of my Ph.D. at MIT CSAIL, co-advised by Matei and Sam Madden as part of the MIT DB group.

During my Ph.D., I focused on a few different research areas: systems and machine learning, databases and data analytics, ML applications for medicine, and computer networking. You can check out my Google Scholar profile here, or my CV here.

Papers

POP: Partitioned Optimization Problems

We showed that many resource allocation problems in computer systems—such as traffic engineering, cluster scheduling, and query load-balancing—can be classified as granular mathematical optimization problems. Such problems can be solved much more efficiently using random partitioning, which only introduces a small approximation error. In our experiments, POP achieved allocations within 1.5% of optimality with orders-of-magnitude improvements in runtime.

D. Narayanan, F. Kazhamiaka, F. Abuzaid, P. Kraft, A. Agrawal, S. Kandula, S. Boyd, M. Zaharia. Solving Large-Scale Granular Resource Allocation Problems Efficiently With POP, SOSP 2021.

[Code] [Artifact Evaluation] [ArXiv Preprint]

NCFlow: Network Contractions for Flow Problems

In this project, we developed a new algorithm for Traffic Engineering that solves the Multi-commodity Maximum Total Flow problem on Wide-Area Networks (WANs) much more efficiently, with only a small loss of flow. Our algorithm i) contracts the WAN into disjoint clusters, ii) solves a simpler flow problem on the contracted network, and iii) solves the per-cluster flow problems in parallel. We benchmarked NCFlow on WAN topologies that are over 10x larger than those considered in prior work and found it to be 8.2x faster than the state of the art in the median case (and up to 4,000x), while allocating 98.8% of the total flow and using 6x fewer forwarding entries.

F. Abuzaid, S. Kandula, B. Arzani, I. Menache, M. Zaharia, P. Bailis. Contracting Wide-area Network Topologies to Solve Flow Problems Quickly, NSDI 2021.

[Code] [Talk from NSDI 2021] [Technical Report]

Machine Learning to Detect Atrial Fibrillation

In this collaboration with Dr. Sanjiv Narayan’s Computational Arrhythmia Research Laboratory, we demonstrate that convolutional neural networks can identify potential treatment sites for atrial fibrillation, a common form of heart arrhythmia. In our evaluation, we show that CNNs can identify these sites with 95.0% accuracy.

M. Alhusseini, F. Abuzaid, A. Rogers, J. Zaman, T. Baykaner, P. Clopton, P. Bailis, M. Zaharia, P. Wang, W-J. Rappel, and S. Narayan. Machine Learning to Classify Intracardiac Electrical Patterns During Atrial Fibrillation, Circulation: Arrhythmia and Electrophysiology 2020.

Optimus + Maximus

In this paper, we show that blocked matrix multiply—a naive, hardware-optimized approach—surprisingly outperforms the state-of-the-art MIPS solvers by up to 12x for some (but not all) inputs. In response, we present a novel MIPS solution, Maximus, that takes advantage of hardware efficiency and pruning of the search space; we also introduce a new data-dependent optimizer, Optimus, that selects online with minimal overhead the best MIPS solver for a given set of inputs. Together, Optimus and Maximus outperform state-of-the-art MIPS solvers by 3.2x on average, and up to 10.9x, on widely studied MIPS datasets.

F. Abuzaid, G. Sethi, P. Bailis, and M. Zaharia. To Index or Not to Index: Optimizing Exact Maximum Inner Product Search, ICDE 2019.

[Code]

MacroBase SQL

DIFF, a new SQL operator, provides a generalizable interface for finding explanations in large-scale datasets. Our implementation of DIFF in MacroBase SQL (a fork of MacroBase) outperforms other state-of-the-art explanation engines by up to an order of magnitude.

F. Abuzaid, P. Kraft, S. Suri, E. Gan, E. Xu, A. Shenoy, A. Ananthanarayan, J. Sheu, E. Meijer, X. Wu, J. Naughton, P. Bailis, and M. Zaharia. DIFF: A Relational Interface for Large-Scale Data Explanation, VLDB 2019.

[Code]

MacroBase

MacroBase is a new a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.

F. Abuzaid, P. Bailis, J. Ding, E. Gan, S. Madden, D. Narayanan, K. Rong, S. Suri (alphabetical). MacroBase: Prioritizing Attention in Fast Data, ACM Transactions on Database Systems (TODS) - Best of SIGMOND 2017 Papers.

[Website] [Code] [Talk from ODSC West 2018]

Sparser

22x speed-ups for parsing JSON, Avro, and Parquet data.

S. Palkar, F. Abuzaid, P. Bailis, and M. Zaharia. Filter Before You Parse: Faster Analytics on Raw Data with Sparser, VLDB 2018.

[Code] [Blog Post] [Talk from Spark+AI Summit 2018]

NoScope

100x speedups for CNN evaluation on video streams.

D. Kang, J. Emmons, F. Abuzaid, P. Bailis, and M. Zaharia. NoScope: Optimizing Neural Network Queries over Video at Scale, VLDB 2017.

[Code] [Blog Post]

Yggdrasil

A project on training deep decision trees at scale. Compatible with Spark MLlib 1.6+.

F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, and A. Talwalkar. Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale, NIPS 2016.

[Code] [Spark Package] [Slides from ML Systems Workshop, NIPS 2016] [Talk from Spark Summit 2016]