Data Day Schedule

Warning: This is a draft schedule and is subject to change.

Alternative schedule view can be found here

https://pretalx.com/pycon-lithuania-2025/schedule/

Python Schedule Data Schedule AI Schedule

Thursday, April 24

09:00

Data Day Opening

Tomas Peluritis

Room: 101

Org

Talk

09:00–09:25

View Details

09:30

Build Your Own (Simple) Static Code Analyzer

Stefanie Molin

Room: 101

Keynote

09:30–10:30

View Details

11:00

Beyond Deployment: Continuously Adding Features to Drive Marginal Gains in Models

Mark Fukson

Machine learning models are never truly “done.” As data evolves, so should the models that rely on it. But how can we ensure continuous improvement without costly retraining or manual intervention? In this talk, we introduce an automated pipeline designed to incrementally enhance model performance by systematically testing and integrating new features.

Room: 203

Data Day - Apr 24

Talk

11:00–11:25

View Details

Orchestrating an end-to-end Data Engineering Workflow: Leveraging Python in Apache Beam and Airflow

Sadeeq Akintola

This talk explores the synergy between Apache Beam and Apache Airflow, demonstrating how to create a robust, end-to-end data engineering workflow. We'll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow's scheduling capabilities with Beam's data processing framework can create more efficient and manageable data pipelines. The session will cover integration with Google Cloud Platform services, including Cloud Functions, BigQuery, and Gemini AI models.

Room: 218 Workshops

Data Day - Apr 24

Workshop

11:00–11:55

View Details

How I tracked my stocks with Python

Ąžuolas Krušna

In this talk, I’ll share how I began trading stocks and why I turned to Python to track my performance—along with the abundance of surprises that came with it. We’ll walk through the building blocks of two Python-powered apps: one that extracts stock transactions from screenshots, and another that generates summaries of my trading to uncover valuable insights

Room: 228

Data Day - Apr 24

Talk

11:00–11:25

View Details

Data Warehouses Meet Data Lakes

Mauro Pelucchi

Many organizations have migrated their data warehouses to datalake solutions in recent years. With the convergence of the data warehouse and the data lake, a new data management paradigm has emerged that combines the best of 2 approaches: the botton-up of big data and the top-down of a classic data warehouse.

Room: 101

Data Day - Apr 24

Talk

11:00–11:25

View Details

11:30

From Chaos to Control: Automating BI Tools with Pydantic and Python

Patricia Goldberg

Maintaining Business Intelligent Tool (BI) governance, managing permissions, syncing documentation, and handling schema changes, can be chaotic. This talk explores how Python, Pydantic, and smart design patterns automate these tasks, ensuring seamless BI tool governance. Learn how to auto-sync table metadata, adjust queries on column renames, and enforce permissions effortlessly. With real-world examples, discover how to transform BI maintenance from a headache into a streamlined, automated process.

Room: 101

Data Day - Apr 24

Talk

11:30–11:55

View Details

Testable data pipelines

Florian Stefan

The "data build tool" (DBT) was designed to unlock software engineering best practices for SQL-based data pipelines: pipelines as version controlled directed acyclic graphs (DAGs) consisting of testable and reusable nodes. With the increasing number of cloud data warehouses and data lakehouses that allow the native execution of Python code, DBT also added support for Python models. In this talk, I will explain how Flatiron Health uses DBT and share our experiences with unit and data testing.

Room: 203

Data Day - Apr 24

Talk

11:30–11:55

View Details

Cutting the price of Scraping Cloud Costs

Ed Crewe

A case study of rewriting a simple data pipeline involving Python, a pinch of Go, Git workflows, Airflow, Postgres and Cloud. Investigating some common assumptions and principles of designing data pipelines. The benefits and issues with the tools and how these may be handled. I hope this case study of a pipeline rewrite will give you insights that are applicable to Python use for your own data pipelines, and into cloud pricing.

Room: 228

Data Day - Apr 24

Talk

11:30–11:55

View Details

12:00

Python on the Pitch: How Germany will win World Cup 2026

Ruslan Korniichuk

We will dive into the fascinating world of football analytics, showcasing how to collect and process match data (e.g., Hudl Statsbomb, Sportmonks, and Understat), including player tracking, event logs, and tactical formations. Attendees will walk away with practical knowledge and Jupyter Notebooks, demonstrating Python's power in decoding modern football strategies.

Room: 228

Data Day - Apr 24

Talk

12:00–12:25

View Details

Build & Deploy Apps like a (pro) Data Scientist using Streamlit

Siddharth Gupta

Do you ever find it complicated to learn the complexities of a traditional web framework to push your data science work online? Worry no more! Streamlit might help speed things up as it is designed for the required purpose - creating beautiful data-related web apps that can be deployed in minutes. In the hands-on tutorial, we’ll go through various features of Streamlit and build a small lyric fetcher app based on the available curated dataset of around 24K Billboard top-100 songs.

Room: 218 Workshops

Data Day - Apr 24

Workshop

12:00–12:55

View Details

cluster-experiments: A Python library for end-to-end A/B testing workflows

David Masip

In this talk, we introduce cluster-experiments, a Python library designed to facilitate end-to-end A/B testing workflows, including power analysis, experiment analysis, and variance reduction techniques.

Room: 203

Data Day - Apr 24

Talk

12:00–12:45

View Details

Real-Time Data Analytics at Scale: From Ingestion to Retrieval

Tung Hoang

Real-time data analytics is essential for powering modern applications like monitoring, personalization, search, and to some extend, RAG pipelines. However, building systems that can handle real-time ingestion, indexing, and retrieval at scale is no trivial task. This talk provides actionable insights into designing and maintaining such systems at scale using best practices.

Room: 101

Data Day - Apr 24

Talk

12:00–12:25

View Details

14:00

Accelerating privacy-enhancing data processing

Florian Stefan

Our mission is simple but profound: to improve and extend lives by learning from the experience of every person with cancer. This talk explains how we transform sensitive data from heterogeneous environments into research-grade datasets. And how we shift insights generation left to iterate faster.

Room: 101

Data Day - Apr 24

Talk

14:00–14:25

View Details

Using feature stores to deliver awesome models

Laurynas Stašys

Mantas Cepulkovskis

In today’s fast-paced machine learning environment, the ability to efficiently manage and reuse features across multiple models is crucial. This workshop explores how leveraging a feature store can streamline ML pipelines by ensuring consistency and accelerating deployment cycles. Participants will gain hands-on experience with setting up, managing, and integrating feature stores into their existing workflows—transforming raw data into valuable, production-ready features.

Room: 218 Workshops

Data Day - Apr 24

Workshop

14:00–14:55

View Details

A Crash course in Time Series Forecasting from Naive to Foundational

Pietro Peterlongo

Forecasting is a common activity that has clear business value in various domains but it is not a very common skill that Data Scientists have or feel confident about. In this crash course I will cover the fundamentals of Time Series forecasting from the basic methods to more advanced techniques. I will do this showcasing practical code examples using libraries from Nixtla.

Room: 228

Data Day - Apr 24

Talk

14:00–14:25

View Details

Smarter Retrieval, Better Generation: Improving RAG Systems

David Batista

Good retrieval performance is key to an effective RAG system, as it ensures relevant information is selected, directly impacting augmentation and generation quality. My presentation focuses on RAG indexing and retrieval, exploring methods to convert text into searchable formats, comparing techniques, and analyzing their advantages, disadvantages, and performance on an annotated dataset to enhance document retrieval based on user queries.

Room: 203

Data Day - Apr 24

Talk

14:00–14:25

View Details

14:30

Automate Brag Document Writing with LLMs

Ludvig Wärnberg Gerdin

A brag document is a powerful tool to highlight your work by making it visible, measurable, and demonstrating its real impact on you and your organisation - but such a document can be time-consuming to maintain. My talk explores automation of the writing process with language models fed with data from tools like Jira, Notion, and code commits. Learn how to save time, avoid registering missed achievements, and make your work stand out. Ideal for engineers at all levels looking to grow their impact.

Room: 203

Data Day - Apr 24

Talk

14:30–14:55

View Details

Working for a Faster World: Accelerating Data Science with Less Resources

Maximilian Lattka

In data science, speed matters as much as accuracy, especially when users expect quick results. This talk explores simple yet effective techniques to boost performance and responsiveness on data-centric web apps based on practical experience working with Panel apps. While some strategies are case-specific, most apply broadly to data-driven projects.

Room: 101

Data Day - Apr 24

Talk

14:30–14:55

View Details

Temporal: Bulletproof Workflows

Ruslan Korniichuk

Temporal is an open source, distributed, and scalable workflow orchestration platform designed to execute mission-critical business logic with resilience. Manage failures, network outages, flaky endpoints, long-running processes and more, ensuring your workflows never fail.

Room: 228

Data Day - Apr 24

Talk

14:30–14:55

View Details

15:00

Beyond dbt: Modern SQL Transformation and Lineage with sqlglot and sqlmesh

Tomas Peluritis

Hear more about the evolving landscape of SQL transformation tools and data lineage challenges. Explore how sqlglot enables powerful SQL parsing and transformation capabilities, and see practical demonstrations of sqlmesh as a modern alternative to dbt. Learn about open-source approaches to data lineage tracking and discover how these tools are shaping the future of data engineering workflows.

Room: 101

Data Day - Apr 24

Talk

15:00–15:25

View Details

Investing: Technical Analysis libraries in Python

Ruslan Korniichuk

We will explore the landscape of technical analysis libraries available for the Python language, including popular choices like TA-Lib (aka talib), Pandas TA, and Technical Analysis (aka bukosabino/ta) library.

Room: 218 Workshops

Python Day - Apr 23

Workshop

15:00–15:55

View Details

Organize your data stack using Dagster

Patricia Goldberg

An intro to the Dagster open-source orchestration tool. Data Tool Stack. What is Dagster, and who is it for? What are its main use cases? Testing the data and the code. Deployment ideas to production.

Room: 228

Data Day - Apr 24

Talk

15:00–15:25

View Details

Image deduplication using embeddings

Jonas Jarutis

This presentation examines approaches for detecting and eliminating near-duplicate images across datasets ranging from small collections to repositories containing millions of images. We will compare the performance of several embedding models, including CLIP, ResNet, and other variants, assessing their ability to capture semantic and perceptual similarity and performance tradeoffs. We will benchmark various vector database solutions on query speed and memory consumption.

Room: 203

Data Day - Apr 24

Talk

15:00–15:25

View Details

15:30

Top 5 Lessons from a Senior Data Scientist

Megan Robertson

A successful data scientist needs to have solid coding skills and stay up to date with the latest artificial intelligence and machine learning algorithms. However, there are many other skills and experiences that help you succeed in data science. In this talk Megan shares five of her most helpful career lessons she's learned in over eight years as a data scientist. These lessons will include tips on advocating for your own career development, how to collaborate with other teams and more.

Room: 101

Data Day - Apr 24

Talk

15:30–15:55

View Details

Variable Selection: What your model can't tell you

James Donahue

Variable selection is often left up to an algorithm. However, controlling for some variables can improve measurement accuracy, and thus overall performance. On the other hand, certain "bad" controls can block pathways of relationships between variables that we want to preserve or create spurious correlations. Using real and simulated data, I explain when to reconsider your controls, and why that may significantly improve model accuracy.

Room: 203

Data Day - Apr 24

Talk

15:30–15:55

View Details

The Power of Python for Data Management (or How You’ve Been Doing Data Management All Along Without Even Realizing It)

Vidmantė Čižienė

Are you using Airflow or Pandas? Great! You've contributed to better data management at your organization. The breakthrough of AI has reignited focus on high-quality data and effective data governance (not that scary as it sounds!) and management practices. AI needs fit-for-purpose data to reach its potential, and we already have powerful toolkit — like Airflow, Pandas, Matplotlib/Seaborn, or Great Expectations — to optimize workflows and ensure data quality.

Room: 228

Data Day - Apr 24

Talk

15:30–15:55

View Details

16:30

The evolution of data management techniques

Gabor Szarnyas

Data management systems have gone through significant changes in the last 10 years, driven by user demands, novel techniques and improvements in hardware. These have far-reaching implications on how systems are deployed and used in practice. In this talk, I will focus on three key aspects of modern data management systems: scalability, mutability, and interface. I will share my personal experiences, and will bring several examples from the database and data science worlds.

Room: 101

Keynote

16:30–17:30

View Details

Python Data AI