Portable Feature Engineering with Hamilton: Write Once, Run Everywhere

Room: Saphire B - PyData

Date: 2023-05-19

Time: 15:00 - 15:25

Abstract

Most data transformations are written twice. In the field of feature engineering for Machine Learning, data scientists regularly have to build, manage, and iterate on batch jobs, then translate those jobs to a service setting to load data and make fresh predictions. At best, this process is an engineering headache. At worst, this can result in difficult-to-detect deltas between training and inference, complex code, and highly bespoke infrastructure. In this talk we discuss Hamilton, a lightweight open-source framework in python that enables data practitioners to cleanly and portably define dataflows. Hamilton places no restrictions on the nature of transformations, allowing data scientists to use their favorite python libraries. With Hamilton, you can run the same code in your airflow DAG for training as you would in your fastAPI service for inference, and get the same result.

Elijah ben Izzy

Elijah has always enjoyed working at the intersection of math and engineering. More recently, he has focused his career on building tools to make data scientists more productive. At Two Sigma, he was building infrastructure to help quantitative researchers efficiently turn ideas into production trading models. At Stitch Fix he led the Model Lifecycle team — a team that focuses on streamlining the experience for data scientists to create and ship machine learning models. He is now focusing on building out DAGworks, Inc, a YC-backed startup that aims to make it easier for data scientists to build and maintain ETLs for machine learning. In his spare time, he enjoys geeking out about fractals, poring over antique maps, and playing jazz piano.