Papers
Solving Large-Scale Granular Resource Allocation Problems Efficiently
Solving Flow Problems on Wide-Area Networks
Machine Learning to detect Atrial Fibrillation
In this collaboration with Dr. Sanjiv Narayan’s Computational Arrhythmia Research Laboratory, we demonstrate that convolutional neural networks can identify potential treatment sites for atrial fibrillation, a common form of heart arrhythmia. In our evaluation, we show that CNNs can identify these sites with 95.0% accuracy.
Optimus + Maximus
In this paper, we show that blocked matrix multiply—a naive, hardware-optimized approach—surprisingly outperforms the state-of-the-art MIPS solvers by up to 12x for some (but not all) inputs. In response, we present a novel MIPS solution, Maximus, that takes advantage of hardware efficiency and pruning of the search space; we also introduce a new data-dependent optimizer, Optimus, that selects online with minimal overhead the best MIPS solver for a given set of inputs. Together, Optimus and Maximus outperform state-of-the-art MIPS solvers by 3.2x on average, and up to 10.9x, on widely studied MIPS datasets.
[Code]MacroBase SQL
DIFF, a new SQL operator, provides a generalizable interface for finding explanations in large-scale datasets. Our implementation of DIFF in MacroBase SQL (a fork of MacroBase) outperforms other state-of-the-art explanation engines by up to an order of magnitude.
[Code]MacroBase
MacroBase is a new a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.
[Website] [Code] [Talk from ODSC West 2018]Sparser
22x speed-ups for parsing JSON, Avro, and Parquet data.
[Code] [Blog Post] [Talk from Spark+AI Summit 2018]Yggdrasil
A project on training deep decision trees at scale. Compatible with Spark MLlib 1.6+.
[Code] [Spark Package] [Slides from ML Systems Workshop, NIPS 2016] [Talk from Spark Summit 2016]