April 9, 2026
Explore the world of data with Python. From data engineering and pipelines to analytics, visualization, and machine learning — learn how to work with data at scale and derive insights that matter.
Keynote Speakers

Katharine Jarmul
Founder, Kjamistan
Katharine Jarmul focuses her work and research on privacy and security in data science, deep learning and AI. She is author of the well received book Practical Data Privacy (O'Reilly 2023) and has more than 10 years experience in machine learning/AI where she has helped build large scale AI systems with privacy and security built in. You can follow her work via her newsletter, Probably Private (https://probablyprivate.com) or on her website at kjamistan.com.

Tomas Peluritis
Head of Data, Mediatech
Tomas Peluritis is Head of Data at Mediatech and founder of Uncle Data, a newsletter and podcast for data engineers who like their insights practical. He's built data stacks from the ground up or adjusted as needed, led teams across multiple companies and countries, and speaks at conferences sharing lessons from experience — the kind that don't make it into the docs. He believes in simple over smart and writes about what he's actually seen work (and fail) in production. He lives in Vilnius, Lithuania with his family, plays Magic: The Gathering poorly, but tries to improve.
What to Expect
Talks and workshops happening on Data Day
Data Day Intro
How I mapped 10 000 illegal Airbnbs with Python
In the spring of 2024, a cry erupted across Spain: "The Canary Islands have reached their limit." Soon, massive demonstrations followed in other regions facing similar tensions: "Mallorca is not for sale," "Enough of Ibiza," "Cantabria will defend itself." Meanwhile, in Madrid, the capital, a similar discontent had been brewing for years. Could open data be used to understand the phenomenon, and do something about it?
Juan Luis Cano Rodríguez
Juan Luis (he/him/él) is an aerospace engineer with a passion for tech communities and sustainability. He is currently working at Canonical as Developer Success Engineer, dedicating his time to amplify the global impact of open source. Juan Luis has a decade of experience as developer advocate, software engineer, and Python trainer in several industries, in companies of the likes of McKinsey, Read the Docs, Satellogic, Telefónica, and others. PSF Fellow since 2017, he has made significant contributions to the PyData stack, published several open-source packages, and organized the first seven PyCons in Spain. Currently, he is the lead organizer of the PyData Madrid monthly meetups. Obsessed about systemic change and looking for a way to live within our planetary boundaries ♻️
Python for Data Quality in 2025: Why tests alone are no longer enough
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Beyond the Static 2D Plot - Spatial Data Storytelling in 4D
2D static plots are great, but they are static. Data isn't - it changes. So we turn a plot into an animation. But we don't live in planes - we live in space. And we want to send a message, not just show an animation. This leads us to the 3D animated story! In this talk I will close the gap between abstract data and its physical reality. Through step-by-step examples using (Geo)Pandas, (I)Pydeck, PyVista, Blender etc., I will turn basic charts into 4D stories with custom models added to geospace.
Kęstutis Gadeikis
Kestutis Gadeikis (PhD, EMBA) is the Chief Actuary and Data Governance Manager at Lietuvos Draudimas (PZU Group), the insurance market leader in the Baltics. With a PhD in Mathematics and an EMBA, he specializes in finding the right balance between the data analysis and the impactful presentation of its results.
Exposing Greenwashing: Satellite ML for Carbon Credit Verification
The carbon market is set to reach 1T dollars by 2030, yet 84% of offsets fail to deliver real climate benefits. Verification still relies on sparse site visits and self-reported data. This poster shows a Python workflow that audits carbon projects using satellite imagery and ML, detecting over-crediting and leakage in REDD+ sites. With open data and open-source tools, anyone can compare claimed versus observed forest outcomes and verify what projects actually deliver.
Neeraj Pandey
Neeraj is the co-founder of Vivid Climate, a climate management and DMRV platform. Neeraj is a polyglot. Over the years, he has worked on a variety of full-stack software and data-science applications, as well as computational arts, and likes the challenge of creating new tools and applications, and is an active international speaker with talks and tutorials presented at multiple conferences.
Leading Through the Shift: What Engineering Leadership Actually Looks Like
Building great engineering teams has never been straightforward — but the rules keep changing. Three back-to-back panel discussions with CTOs and engineering leaders covering the full spectrum: scaling teams and competing for talent in a small market, navigating a leadership role that looks nothing like it did five years ago (AI included), and making the hard calls on culture, tech debt, and architecture when there's no clear playbook to follow.
Justinas Kuizinas, Aurimas Griciunas, Karolina Griciunė, Tomas Peluritis
I'm a Head of Engineering at CoinGate, a crypto payments fintech based in Vilnius. In my experience I've been in various roles: starting as a developer and even testing myself in C level role. I'm most energized when a technical team and a product team stop being two separate things and start building as one. Outside of work, I organize VilniusPy — local Python meetups where developers meet, chat, and share what they're actually working on. I also speak at conferences, because the best way to keep learning is to put your thinking in front of a room., TBD, TBD, Tomas leads data at Mediatech and runs Uncle Data, a newsletter and podcast for data engineers who prefer practical advice over hype. By day, he manages pipelines processing half a billion events; by night, he writes about what he learned (often the hard way). When not wrangling DAGs or mentoring his team, he's probably optimising a Magic: The Gathering deck.
Stats Meets ML - What I learned from my Machine Learning Certification
Statisticians and machine learning specialists have a lot to learn from each other (even if they don't think so). This talk lightheartedly awards points to both classical statistics and machine learning, with an attempt not to offend anyone (but to annoy everyone). Topics include: Are confidence intervals worth it? What is bias, anyway? Can I just code it in Python?
James Donahue
Raised in Nashville, Tennessee, USA, I have bounced around the world, finally settling in Hamburg, Germany. My professional background reflects my vagabond nature. I taught English in Asia, Latin America, and online (full-time remote work before Covid!), worked in adventure tourism in Appalachia (USA) and Chile, and did my masters in Economics in Hamburg. Somewhere along the way, I gathered a few stories and soft skills. I met Python during my masters, but stuck with Matlab through the my PhD coursework, until I decided that academia is not my forever home. Currently I am occupying myself with ensemble methods such as XGBoost, as well as expanding into computer vision with PyTorch and toying with more advanced data visualizations. Naturally, NumPy will always hold a special place in my heart, along with the Cython package to sample everything a Bayesian needs.
And now for something completely different
In this talk we’ll go over two quintessential features of Python programming (generators and duck typing) and you’ll learn how to use them effectively. We’ll also look at branchless conditionals and understand how this unusual idea can shape the way you think about coding. In the end, we’ll put the three together to write a powerful Python idiom and conclude I have terrible taste!
Rodrigo Girão Serrão
Hi, I'm Rodrigo Girão Serrão from sunny Portugal 🇵🇹. I'm a prolific Python author and speaker, with [multiple books published independently](https://mathspp.com/books) and [dozens of talks and tutorials](https://mathspp.com/talks) given at the largest Python conferences in the world. I also [blog frequently about Python](https://mathspp.com/blog) and publish two Python newsletters: the [weekly mathspp insider 🐍🚀](https://mathspp.com/insider) and the [daily Python drops 🐍💧](https://mathspp.com/drops). I have extensive experience teaching people from all walks of life – from kids in school, to professionals in various industries, to retirees – and there is a clear consensus that my students enjoy my clear examples, the live-coding during my lessons, and most surprisingly: my quirky sense of humour.
Multi-Model LLM Orchestration in Python: A Case Study in Research Automation
How do you turn thousands of PDFs into actionable insights? This talk shows how we built a Python-based AI assistant using LLMs and RAG to automate literature reviews: covering architecture, trade-offs, and real lessons from production use in policy research.
Mauro Pelucchi
Mauro Pelucchi is Senior Data Scientist and Big Data Engineer responsible for the design of the “Real-Time Labour Market Information System on Skill Requirements” for CEDEFOP (European Centre for the Development of Vocational Training). He currently works as Head of Global Data Science at Lightcast with the goal to develop innovative models, methods, and deployments of labour market data and other data to meet customer requirements and prototype new potential solutions. His main tasks are related to advanced machine learning modelling, labour market analyses, and the design of big data pipelines to process large datasets of online job vacancies. In collaboration with the University of Milano-Bicocca, he took part in many research projects related to the labour market intelligence systems. He collaborates with the University of Milano-Bicocca as a Lecturer for the Masters of Business Intelligence and Big Data Analytics and with the University of Bergamo as a Lecturer in Computer Engineering.
Designing Python APIs for Data You Don’t Control
The web isn’t an API, but Python developers often treat it like one. This talk explores how to design Python interfaces for unstable data sources, focusing on schema evolution, defensive parsing, and protecting downstream users.
Saurav Jain
Saurav Jain, Apify's Developer Community Manager, excels in community building and devrel. With a history of growing Amplication's community to 40K, he now enhances Apify's developer engagement. An international speaker, he has contributed to more than 100 conferences from remote places in Africa to events in Silicon Valley. His work bridges developers globally, fostering innovation and collaboration within the tech ecosystem. His expertise and passion for technology make him a pivotal figure in nurturing tech communities.
Creative Data Storytelling with Python
Python enables data professionals to move beyond analysis and transform information into clear, compelling stories. With various libraries, Python supports insightful exploration, expressive visualizations, and interactive elements that enhance communication. This talk highlights practical techniques for turning patterns, trends, and insights into engaging narratives, making data more understandable, impactful, and actionable.
Purva
Purva Porwal is an AI/ML Manager at State Street Corp with decade of experience in the tech industry. Primarily contributed in Conversational AI and Natural Language Processing, she has driven impactful innovations across key industries such as finance and telecommunications. Her passion lies in harnessing AI’s potential to create meaningful transformation, and is constantly exploring emerging technologies in the field. Beyond her technical expertise, Purva is an active mentor, supporting and inspiring future professionals in the AI space. She plays an integral role in the AI community, contributing to thought leadership and fostering collaborative progress.
Python, rust and arrow for data processing
Python struggles with heavy data loads. Rust offers speed, and PyO3 makes bridging the two seamless. This talk shows how to build a shared Rust core to avoid code duplication. I will also cover using Apache Arrow for zero-copy data sharing and removing serialization costs entirely. Discover how this stack enables high-performance data processing in Python and Pyspark
Paulius Venclovas
Paulius Venclovas is a Senior Data Engineer at "Flo Health" where he focus on delivering data processing and governance solutions on Databricks platform. Previously he worked at financial startup "Curve" and data consultancy company "Beyond Analysis". Paulius also holds a Master’s degree in Computing (AI/ML) from Imperial College London.
Master the Art of Schema Dissection: Operation Data Engineer
Have you ever faced an enormous, wide table? Sometimes you need to cope with it because it's faster in this form, and that's what your stakeholders need. Anyway, it’s hiding entities, metrics, and time semantics. Learn how the dissection framework reveals schema structure and turns risky rewrites into surgical precision.
Antonino (Nino) Cangialosi
Hi there, I'm Nino! An Italian-Finnish theoretical physicist turned into a Software Engineer for a living. My relentless curiosity drives me to continuously explore new languages and frameworks. Over the past five years, I've thrived in roles ranging from DevOps and Full-Stack to Platform and Big Data Engineering. When I'm not diving into data, you can find me challenging gravity with bouldering, centering my mind through yoga, or strategizing my next chess move.
Beyond SHAP: Diagnosing Vector Embeddings with Visual Explainable AI
When your embedding-based classification model fails, should you collect more data or try a different approach? This talk shares a practical XAI workflow using UMAP visualization and prototype analysis to uncover systematic failures. We will explore how to use these tools to identify semantic overlaps and make evidence-based decisions when debugging high-dimensional similarity systems.
Valdas Druskinis
Valdas is a Machine Learning Engineer at carVertical, where he builds reliable and interpretable ML systems with a strong focus on end-to-end model and data lifecycles. Previously, he has lived and worked in multiple countries, including time at Mercedes-Benz AG, and has built production systems across classical ML, deep learning, and GenAI.
Data versioning
One of the core fundamental pieces of technology every software-related tech stack is heavily dependent on is Git. The ability to version code and control the flow of development is the only common focus for every software project. We take for granted that everyone in the working industry can indeed properly version code. In this talk, we’ll explore the meaning of data versioning and how we could borrow methodologies from the software engineering field to better manage our data.
Federico Marchesi
Hi, my name is Federico Marchesi. During my career, I have had the pleasure to work with a variety of different ML systems, ranging from complex OLAP systems, distributed Machine learning inference platforms, and I have also touched the rise of modern data lakehouses. I’m especially passionate about data, which I believe is the foundation of modern software, not just in ML. Outside of work, I enjoy staying active through MTB, swimming, and running. I’m also a passionate motorsport enthusiast.
From Sports Stats to AI Safety: The Ranking Renaissance
In 1952, two statisticians proposed a simple model for comparing sports teams. In 1960, it became the foundation of chess ratings. Today, that same math powers everything from ChatGPT's alignment to Chatbot Arena leaderboards. But you don't need to be OpenAI to use it. I'll show how Bradley-Terry and preference-based ranking can solve everyday problems: comparing A/B test variants, ranking search results, evaluating ML models, prioritizing features, and more.
Gediminas Sadaunykas
AI Tech Lead and Data Scientist, with over 10 years of combined experience, in various domains including facilities management, medical information, sports and global consumer app business. Founder at AimRank, AI Innovation Labs, specializing in ranking systems.
Behind Every Instant Loan Is Data Science: How Python Scorecards Decide Credit Risk
Modern digital lending demands instant decisions, and behind those decisions is a Data Science workflow powered by scorecard. This talk explains how scorecards calculates credit risk in a transparent and scalable way, from feature engineering to production deployment. Using real examples from our company, models that enable fast, reliable loan approvals.
Zafarzhon Irismetov
Senior Data Scientist at loan lending business.
Airflow Lessons They Don't Put in the Docs
Airflow basics are well documented. Production Airflow is not. This talk covers the patterns, costs, and migration pitfalls that only show up after you've deployed: dynamic DAGs that scale, sensors that don't waste resources, CloudWatch bills that surprise you, and MWAA version upgrades that break in ways the changelog didn't mention. Practical lessons for teams running Airflow beyond the tutorial stage.
Tomas Peluritis
Tomas leads data at Mediatech and runs Uncle Data, a newsletter and podcast for data engineers who prefer practical advice over hype. By day, he manages pipelines processing half a billion events; by night, he writes about what he learned (often the hard way). When not wrangling DAGs or mentoring his team, he's probably optimising a Magic: The Gathering deck.
Cloud Data Solutions Are Overrated: Building a Pan-European Business Database for Lunch Money
We are told that modern data engineering requires expensive cloud warehouses and enterprise SaaS. This talk challenges that narrative. I will show how I built scoris.eu - aggregating business data from hundreds of sources across Lithuania, Latvia, Estonia, Finland, and the UK - as a solo developer. Using a purely open-source Python stack (dlt, dbt, Prefect) on cheap infrastructure, I will demonstrate that with the right architecture, you can integrate data at scale without burning cash on the cloud.
Antanas Baltrušaitis
After 18 years in the data and analytics industry - spanning roles from banking strategy at Nordea to Head of Data at Luminor and Girteka - I made a conscious decision to step out of the corporate world. Why? Because my skillset doesn't fit a standard job description. I am a true Generalist in a world that often tries to specialize. I don’t just analyze data; I engineer the pipelines, write the code, design the UI, and map the business strategy. I realized that my ability to manage the entire data journey - from raw SQL to the final user experience - was best utilized in building my own solutions rather than managing narrow slices of corporate infrastructure. Today, I am fully dedicated to Product Development. I build software where data isn't just a byproduct; it is the core engine. My flagship product, Scoris, is Lithuania's premier open business data aggregator, built on the belief that public data should be accessible and actionable. I also created Oriux, a weather app designed for precision. Few other massive products about to launch soon. I bridge the gap between complex data engineering and tangible business value. I am no longer just advising on strategy; I am executing it, line by line and database by database.
Quantum Machine Learning with Qiskit
Unlock the power of Quantum Machine Learning (QML) with Python and Qiskit. You will explore the distinctions between classical and quantum machine learning and gain an understanding of data encoding, quantum kernels, and the training process of a Variational Quantum Classifier. You’ll also discover how Qiskit Functions integrate quantum into application workflows to solve complex challenges. Finally, you’ll explore IBM case studies showing the transition to practical "quantum utility."
Jonas Adomaitis
Jonas Adomaitis is a Senior Data Scientist and People Manager at IBM Lithuania. He holds a Bachelor of Science in Computer Science and Mathematics from The University of Edinburgh. Jonas is a highly active member of the global quantum community, serving as a member of the Qiskit Advocate program and the Co-lead of IBM’s Quantum Club. His technical expertise encompasses Agentic AI, NLP, and Explainable AI (XAI), with a focus on delivering advanced analytics for the financial and government sectors. Jonas’s contributions have been recognized with several distinctions, including the 2025 Quantum Excellence designation from the Qiskit Global Summer School and the IBM Quantum Challenge 2024 Achievement.
Dataset Updates Without Losing Your Mind
Many teams work with datasets that evolve over time. What starts as simple setup, quickly turns into chaos once updates become regular. In this talk, I share a practical workflow for managing dataset updates by splitting the process into clear stages, each represented by a Python script. This approach was used in production for two years on image datasets from 2,000 to 200,000 samples and helps small teams reduce cognitive load and keep dataset and model updates predictable.
Oleksii Liashuk
Oleksii Liashuk is lead ML engineer working with Python computer vision systems, such as object detection, segmentation, OCR and object tracking. He focuses on practical ML problems like car damage detection, container number recognition in difficult conditions, and continuous dataset and model maintenance. Oleksii has hands-on experience with dataset updates, model retraining cycles and deployment of ML systems using Docker and Kubernetes.