Room: Room 203
April 5
15:00–15:25
Python is a leading language of choice for the Databricks and ML ecosystem, alongside a delta tables stack leveraging Unity catalog to manage petabytes of structured data. To build and experiment with ML data and models, version control has become the backbone of modern machine learning (ML) projects, bringing critical aspects of reproducibility and experimentation to teams who are able to experiment in isolation, while still collaborating on projects.
Python, Data Engineering Concepts, ML Concepts
One critical area that version control ensures is greater data compliance with ACID principles when it comes to transactions.
In this talk we'll demo reproducibility and efficient experimentation management in Python using delta tables managed in Unity. We'll walk through a Python example of how to simulate transactions in lakeFS with a dedicated Python SDK. Branching mechanisms make it possible to manage changes in the data lake, while simulating transactional concepts in data lakes in order to track changes over time and, as well as provide the guardrails of reverting to a previous state when needed for an added level of consistency and traceability.
Nir Ozeri is a seasoned Software Engineer at lakeFS. Over the last decade, he has worked on many different parts of the tech stack from firmware and storage drivers all the way to cloud native systems such as lakeFS, where he is a core member of the development team. Outside the tech realm, Nir's world takes a splash in the water. When he's not crafting code, you'll find him above or beneath the waves, whether it's SCUBA or free diving, riding the waves while surfing, or simply hanging out with his canine companion at the beach. Nir believes that just as in coding, the depths of the ocean hold endless mysteries waiting to be explored, and every wave carries its unique rhythm – much like lines of code waiting to be written.
Oz Katz is the Co-Creator of the open source lakeFS Project, an open source platform that delivers resilience and manageability to object-storage based data lakes, as well as the CTO and co-founder of Treeverse, the company behind lakeFS. Oz engineered and maintained petabyte-scale data infrastructure at analytics giant SmilarWeb, which he joined after the acquisition of Swayy.