April 9, 2026
Explore the world of data with Python. From data engineering and pipelines to analytics, visualization, and machine learning — learn how to work with data at scale and derive insights that matter.
Keynote Speaker

Katharine Jarmul
Founder, Kjamistan
Katharine Jarmul focuses her work and research on privacy and security in data science, deep learning and AI. She is author of the well received book Practical Data Privacy (O'Reilly 2023) and has more than 10 years experience in machine learning/AI where she has helped build large scale AI systems with privacy and security built in. You can follow her work via her newsletter, Probably Private (https://probablyprivate.com) or on her website at kjamistan.com.
What to Expect
Talks and workshops happening on Data Day
Exposing Greenwashing: Satellite ML for Carbon Credit Verification
The carbon market is set to reach 1T dollars by 2030, yet 84% of offsets fail to deliver real climate benefits. Verification still relies on sparse site visits and self-reported data. This poster shows a Python workflow that audits carbon projects using satellite imagery and ML, detecting over-crediting and leakage in REDD+ sites. With open data and open-source tools, anyone can compare claimed versus observed forest outcomes and verify what projects actually deliver.
Neeraj Pandey
Neeraj is the co-founder of Vivid Climate, a climate management and DMRV platform. Neeraj is a polyglot. Over the years, he has worked on a variety of full-stack software and data-science applications, as well as computational arts, and likes the challenge of creating new tools and applications, and is an active international speaker with talks and tutorials presented at multiple conferences.
Python for Data Quality in 2025: Why tests alone are no longer enough
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Beyond the Static 2D Plot - Spatial Data Storytelling in 4D
2D static plots are great, but they are static. Data isn't - it changes. So we turn a plot into an animation. But we don't live in planes - we live in space. And we want to send a message, not just show an animation. This leads us to the 3D animated story! In this talk I will close the gap between abstract data and its physical reality. Through step-by-step examples using (Geo)Pandas, (I)Pydeck, PyVista, Blender etc., I will turn basic charts into 4D stories with custom models added to geospace.
Kęstutis Gadeikis
Kestutis Gadeikis (PhD, EMBA) is the Chief Actuary and Data Governance Manager at Lietuvos Draudimas (PZU Group), the insurance market leader in the Baltics. With a PhD in Mathematics and an EMBA, he specializes in finding the right balance between the data analysis and the impactful presentation of its results.
Conformal Prediction for Time Series: Uncertainty Quantification for Trustworthy Systems
How can we **quantify uncertainty in time series** forecasts, without unrealistic assumptions, and with rock-solid guarantees? This talk introduces **Conformal Prediction** (CP), a framework to generate prediction intervals with guaranteed coverage. Whether you're forecasting energy demand, markets volatility, or space weather disturbances, CP helps you move **from point forecasts to reliable intervals** — even in non-stationary settings.
Vincenzo Ventriglia
A results-driven data professional, focused on hype-free solutions tailored to business needs. I currently create value at the **National Institute of Geophysics and Volcanology**, where I develop machine learning models in the **Space Weather** domain. My work is complemented by finding the hidden stories in data and make them accessible to stakeholders. I studied Physics in Italy (Napoli) and Germany (Frankfurt am Main), previously worked in Analytics within the strategic division of the world's largest professional services network, as well as in the Data Science department of Italy’s leading publishing group. I am also an organiser of **PyData Roma Capitale**, actively involved in building the local Python and data science community. Outside of work, I enjoy theatre, discussing finance, and learning new languages.
Stats Meets ML - What I learned from my Machine Learning Certification
Statisticians and machine learning specialists have a lot to learn from each other (even if they don't think so). This talk lightheartedly awards points to both classical statistics and machine learning, with an attempt not to offend anyone (but to annoy everyone). Topics include: Are confidence intervals worth it? What is bias, anyway? Can I just code it in Python?
James Donahue
Raised in Nashville, Tennessee, USA, I have bounced around the world, finally settling in Hamburg, Germany. My professional background reflects my vagabond nature. I taught English in Asia, Latin America, and online (full-time remote work before Covid!), worked in adventure tourism in Appalachia (USA) and Chile, and did my masters in Economics in Hamburg. Somewhere along the way, I gathered a few stories and soft skills. I met Python during my masters, but stuck with Matlab through the my PhD coursework, until I decided that academia is not my forever home. Currently I am occupying myself with ensemble methods such as XGBoost, as well as expanding into computer vision with PyTorch and toying with more advanced data visualizations. Naturally, NumPy will always hold a special place in my heart, along with the Cython package to sample everything a Bayesian needs.
Airflow Lessons They Don't Put in the Docs
Airflow basics are well documented. Production Airflow is not. This talk covers the patterns, costs, and migration pitfalls that only show up after you've deployed: dynamic DAGs that scale, sensors that don't waste resources, CloudWatch bills that surprise you, and MWAA version upgrades that break in ways the changelog didn't mention. Practical lessons for teams running Airflow beyond the tutorial stage.
Tomas Peluritis
Tomas leads data at Mediatech and runs Uncle Data, a newsletter and podcast for data engineers who prefer practical advice over hype. By day, he manages pipelines processing half a billion events; by night, he writes about what he learned (often the hard way). When not wrangling DAGs or mentoring his team, he's probably optimising a Magic: The Gathering deck.
Python, rust and arrow for data processing
Python struggles with heavy data loads. Rust offers speed, and PyO3 makes bridging the two seamless. This talk shows how to build a shared Rust core to avoid code duplication. I will also cover using Apache Arrow for zero-copy data sharing and removing serialization costs entirely. Discover how this stack enables high-performance data processing in Python and Pyspark
Paulius Venclovas
Paulius Venclovas is a Senior Data Engineer at "Flo Health" where he focus on delivering data processing and governance solutions on Databricks platform. Previously he worked at financial startup "Curve" and data consultancy company "Beyond Analysis". Paulius also holds a Master’s degree in Computing (AI/ML) from Imperial College London.
Creative Data Storytelling with Python
Python enables data professionals to move beyond analysis and transform information into clear, compelling stories. With various libraries, Python supports insightful exploration, expressive visualizations, and interactive elements that enhance communication. This talk highlights practical techniques for turning patterns, trends, and insights into engaging narratives, making data more understandable, impactful, and actionable.
Purva
Purva Porwal is an AI/ML Manager at State Street Corp with decade of experience in the tech industry. Primarily contributed in Conversational AI and Natural Language Processing, she has driven impactful innovations across key industries such as finance and telecommunications. Her passion lies in harnessing AI’s potential to create meaningful transformation, and is constantly exploring emerging technologies in the field. Beyond her technical expertise, Purva is an active mentor, supporting and inspiring future professionals in the AI space. She plays an integral role in the AI community, contributing to thought leadership and fostering collaborative progress.
Designing Python APIs for Data You Don’t Control
The web isn’t an API, but Python developers often treat it like one. This talk explores how to design Python interfaces for unstable data sources, focusing on schema evolution, defensive parsing, and protecting downstream users.
Saurav Jain
Saurav Jain, Apify's Developer Community Manager, excels in community building and devrel. With a history of growing Amplication's community to 40K, he now enhances Apify's developer engagement. An international speaker, he has contributed to more than 100 conferences from remote places in Africa to events in Silicon Valley. His work bridges developers globally, fostering innovation and collaboration within the tech ecosystem. His expertise and passion for technology make him a pivotal figure in nurturing tech communities.
Data versioning
One of the core fundamental pieces of technology every software-related tech stack is heavily dependent on is Git. The ability to version code and control the flow of development is the only common focus for every software project. We take for granted that everyone in the working industry can indeed properly version code. In this talk, we’ll explore the meaning of data versioning and how we could borrow methodologies from the software engineering field to better manage our data.
Federico Marchesi
Hi, my name is Federico Marchesi. During my career, I have had the pleasure to work with a variety of different ML systems, ranging from complex OLAP systems, distributed Machine learning inference platforms, and I have also touched the rise of modern data lakehouses. I’m especially passionate about data, which I believe is the foundation of modern software, not just in ML. Outside of work, I enjoy staying active through MTB, swimming, and running. I’m also a passionate motorsport enthusiast.
Behind Every Instant Loan Is Data Science: How Python Scorecards Decide Credit Risk
Modern digital lending demands instant decisions, and behind those decisions is a Data Science workflow powered by scorecard. This talk explains how scorecards calculates credit risk in a transparent and scalable way, from feature engineering to production deployment. Using real examples from our company, models that enable fast, reliable loan approvals.
Zafarzhon Irismetov
Senior Data Scientist at loan lending business.