Data Day Schedule
Thursday, April 24
10:30
Beyond dbt: Modern SQL Transformation and Lineage with sqlglot and sqlmesh

Hear more about the evolving landscape of SQL transformation tools and data lineage challenges. Explore how sqlglot enables powerful SQL parsing and transformation capabilities, and see practical demonstrations of sqlmesh as a modern alternative to dbt. Learn about open-source approaches to data lineage tracking and discover how these tools are shaping the future of data engineering workflows.
Room: 101
Data Day - Apr 24
Talk
10:30–10:55
Data-Driven Impact in Africa

There are a lot of NGOs in Africa, trying to help improve lives. The problem is we do not have enough data to help them understand us well to curate impactful humanitarian programs.
Discover how NGOs can leverage data science to understand and serve African communities better. Learn about data collection, privacy, and impact assessment.
Room: Workshop 2
Data Day - Apr 24
Workshop
10:30–11:25
11:00
Data Warehouses Meet Data Lakes

Many organizations have migrated their data warehouses to datalake solutions in recent years.
With the convergence of the data warehouse and the data lake, a new data management paradigm has emerged that combines the best of 2 approaches: the botton-up of big data and the top-down of a classic data warehouse.
Room: 101
Data Day - Apr 24
Talk
11:00–11:25
11:30
From Chaos to Control: Automating BI Tools with Pydantic and Python

Maintaining Business Intelligent Tool (BI) governance, managing permissions, syncing documentation, and handling schema changes, can be chaotic. This talk explores how Python, Pydantic, and smart design patterns automate these tasks, ensuring seamless BI tool governance. Learn how to auto-sync table metadata, adjust queries on column renames, and enforce permissions effortlessly. With real-world examples, discover how to transform BI maintenance from a headache into a streamlined, automated process.
Room: 101
Data Day - Apr 24
Talk
11:30–11:55
Orchestrating an end-to-end Data Engineering Workflow: Leveraging Python in Apache Beam and Airflow

This talk explores the synergy between Apache Beam and Apache Airflow, demonstrating how to create a robust, end-to-end data engineering workflow. We'll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow's scheduling capabilities with Beam's data processing framework can create more efficient and manageable data pipelines. The session will cover integration with Google Cloud Platform services, including Cloud Functions, BigQuery, and Gemini AI models.
Room: Workshop 2
Data Day - Apr 24
Workshop
11:30–12:25
cluster-experiments: A Python library for end-to-end A/B testing workflows

In this talk, we introduce cluster-experiments, a Python library designed to facilitate end-to-end A/B testing workflows, including power analysis, experiment analysis, and variance reduction techniques.
Room: Workshop 1
Data Day - Apr 24
Talk
11:30–12:15
12:00
Variable Selection: What your model can't tell you

Variable selection is often left up to an algorithm. However, controlling for some variables can improve measurement accuracy, and thus overall performance. On the other hand, certain "bad" controls can block pathways of relationships between variables that we want to preserve or create spurious correlations. Using real and simulated data, I explain when to reconsider your controls, and why that may significantly improve model accuracy.
Room: 2
Data Day - Apr 24
Talk
12:00–12:25
Real-Time Data Analytics at Scale: From Ingestion to Retrieval

Real-time data analytics is essential for powering modern applications like monitoring, personalization, search, and to some extend, RAG pipelines. However, building systems that can handle real-time ingestion, indexing, and retrieval at scale is no trivial task. This talk provides actionable insights into designing and maintaining such systems at scale using best practices.
Room: 101
Data Day - Apr 24
Talk
12:00–12:25
13:00
Working for a Faster World: Accelerating Data Science with Less Resources

In data science, speed matters as much as accuracy, especially when users expect quick results. This talk explores simple yet effective techniques to boost performance, using a real-life case of accelerating a Panel app. While some strategies are case-specific, most apply broadly to data-driven projects.
Room: 101
Data Day - Apr 24
Talk
13:00–13:25
Accelerating privacy-enhancing data processing

Our mission is simple but profound: to improve and extend lives by learning from the experience of every person with cancer. This talk explains how we transform sensitive data from heterogeneous environments into research-grade datasets. And how we shift insights generation left to iterate faster.
Room: 2
Data Day - Apr 24
Talk
13:00–13:25
Using feature stores to deliver awesome models


In today’s fast-paced machine learning environment, the ability to efficiently manage and reuse features across multiple models is crucial. This workshop explores how leveraging a feature store can streamline ML pipelines by ensuring consistency and accelerating deployment cycles.
Participants will gain hands-on experience with setting up, managing, and integrating feature stores into their existing workflows—transforming raw data into valuable, production-ready features.
Room: Workshop 1
Data Day - Apr 24
Workshop
13:00–13:55
13:30
The Power of Python for Data Management (or How You’ve Been Doing Data Management All Along Without Even Realizing It)

Are you using Airflow or Pandas? Great! You've contributed to better data management at your organization.
The breakthrough of AI has reignited focus on high-quality data and effective data governance (not that scary as it sounds!) and management practices. AI needs fit-for-purpose data to reach its potential, and we already have powerful toolkit — like Airflow, Pandas, Matplotlib/Seaborn, or Great Expectations — to optimize workflows and ensure data quality.
Room: 3
Data Day - Apr 24
Talk
13:30–13:55
Temporal: Bulletproof Workflows

Temporal is an open source, distributed, and scalable workflow orchestration platform designed to execute mission-critical business logic with resilience. Manage failures, network outages, flaky endpoints, long-running processes and more, ensuring your workflows never fail.
Room: 2
Data Day - Apr 24
Talk
13:30–13:55