April 9, 2026
Explore the world of data with Python. From data engineering and pipelines to analytics, visualization, and machine learning — learn how to work with data at scale and derive insights that matter.
Keynote Speakers

Katharine Jarmul
Founder, Kjamistan
Katharine Jarmul focuses her work and research on privacy and security in data science, deep learning and AI. She is author of the well received book Practical Data Privacy (O'Reilly 2023) and has more than 10 years experience in machine learning/AI where she has helped build large scale AI systems with privacy and security built in. You can follow her work via her newsletter, Probably Private (https://probablyprivate.com) or on her website at kjamistan.com.

Tomas Peluritis
Head of Data, Mediatech
Tomas Peluritis is Head of Data at Mediatech and founder of Uncle Data, a newsletter and podcast for data engineers who like their insights practical. He's built data stacks from the ground up or adjusted as needed, led teams across multiple companies and countries, and speaks at conferences sharing lessons from experience — the kind that don't make it into the docs. He believes in simple over smart and writes about what he's actually seen work (and fail) in production. He lives in Vilnius, Lithuania with his family, plays Magic: The Gathering poorly, but tries to improve.
What to Expect
Talks and workshops happening on Data Day
Exposing Greenwashing: Satellite ML for Carbon Credit Verification
The carbon market is set to reach 1T dollars by 2030, yet 84% of offsets fail to deliver real climate benefits. Verification still relies on sparse site visits and self-reported data. This poster shows a Python workflow that audits carbon projects using satellite imagery and ML, detecting over-crediting and leakage in REDD+ sites. With open data and open-source tools, anyone can compare claimed versus observed forest outcomes and verify what projects actually deliver.
Neeraj Pandey
Neeraj is the co-founder of Vivid Climate, a climate management and DMRV platform. Neeraj is a polyglot. Over the years, he has worked on a variety of full-stack software and data-science applications, as well as computational arts, and likes the challenge of creating new tools and applications, and is an active international speaker with talks and tutorials presented at multiple conferences.
Leading Through the Shift: What Engineering Leadership Actually Looks Like
Building great engineering teams has never been straightforward — but the rules keep changing. Three back-to-back panel discussions with CTOs and engineering leaders covering the full spectrum: scaling teams and competing for talent in a small market, navigating a leadership role that looks nothing like it did five years ago (AI included), and making the hard calls on culture, tech debt, and architecture when there's no clear playbook to follow.
Justinas Kuizinas
A
Python for Data Quality in 2025: Why tests alone are no longer enough
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Beyond the Static 2D Plot - Spatial Data Storytelling in 4D
2D static plots are great, but they are static. Data isn't - it changes. So we turn a plot into an animation. But we don't live in planes - we live in space. And we want to send a message, not just show an animation. This leads us to the 3D animated story! In this talk I will close the gap between abstract data and its physical reality. Through step-by-step examples using (Geo)Pandas, (I)Pydeck, PyVista, Blender etc., I will turn basic charts into 4D stories with custom models added to geospace.
Kęstutis Gadeikis
Kestutis Gadeikis (PhD, EMBA) is the Chief Actuary and Data Governance Manager at Lietuvos Draudimas (PZU Group), the insurance market leader in the Baltics. With a PhD in Mathematics and an EMBA, he specializes in finding the right balance between the data analysis and the impactful presentation of its results.
Stats Meets ML - What I learned from my Machine Learning Certification
Statisticians and machine learning specialists have a lot to learn from each other (even if they don't think so). This talk lightheartedly awards points to both classical statistics and machine learning, with an attempt not to offend anyone (but to annoy everyone). Topics include: Are confidence intervals worth it? What is bias, anyway? Can I just code it in Python?
James Donahue
Raised in Nashville, Tennessee, USA, I have bounced around the world, finally settling in Hamburg, Germany. My professional background reflects my vagabond nature. I taught English in Asia, Latin America, and online (full-time remote work before Covid!), worked in adventure tourism in Appalachia (USA) and Chile, and did my masters in Economics in Hamburg. Somewhere along the way, I gathered a few stories and soft skills. I met Python during my masters, but stuck with Matlab through the my PhD coursework, until I decided that academia is not my forever home. Currently I am occupying myself with ensemble methods such as XGBoost, as well as expanding into computer vision with PyTorch and toying with more advanced data visualizations. Naturally, NumPy will always hold a special place in my heart, along with the Cython package to sample everything a Bayesian needs.
Airflow Lessons They Don't Put in the Docs
Airflow basics are well documented. Production Airflow is not. This talk covers the patterns, costs, and migration pitfalls that only show up after you've deployed: dynamic DAGs that scale, sensors that don't waste resources, CloudWatch bills that surprise you, and MWAA version upgrades that break in ways the changelog didn't mention. Practical lessons for teams running Airflow beyond the tutorial stage.
Tomas Peluritis
Tomas leads data at Mediatech and runs Uncle Data, a newsletter and podcast for data engineers who prefer practical advice over hype. By day, he manages pipelines processing half a billion events; by night, he writes about what he learned (often the hard way). When not wrangling DAGs or mentoring his team, he's probably optimising a Magic: The Gathering deck.
Python, rust and arrow for data processing
Python struggles with heavy data loads. Rust offers speed, and PyO3 makes bridging the two seamless. This talk shows how to build a shared Rust core to avoid code duplication. I will also cover using Apache Arrow for zero-copy data sharing and removing serialization costs entirely. Discover how this stack enables high-performance data processing in Python and Pyspark
Paulius Venclovas
Paulius Venclovas is a Senior Data Engineer at "Flo Health" where he focus on delivering data processing and governance solutions on Databricks platform. Previously he worked at financial startup "Curve" and data consultancy company "Beyond Analysis". Paulius also holds a Master’s degree in Computing (AI/ML) from Imperial College London.
Creative Data Storytelling with Python
Python enables data professionals to move beyond analysis and transform information into clear, compelling stories. With various libraries, Python supports insightful exploration, expressive visualizations, and interactive elements that enhance communication. This talk highlights practical techniques for turning patterns, trends, and insights into engaging narratives, making data more understandable, impactful, and actionable.
Purva
Purva Porwal is an AI/ML Manager at State Street Corp with decade of experience in the tech industry. Primarily contributed in Conversational AI and Natural Language Processing, she has driven impactful innovations across key industries such as finance and telecommunications. Her passion lies in harnessing AI’s potential to create meaningful transformation, and is constantly exploring emerging technologies in the field. Beyond her technical expertise, Purva is an active mentor, supporting and inspiring future professionals in the AI space. She plays an integral role in the AI community, contributing to thought leadership and fostering collaborative progress.
Designing Python APIs for Data You Don’t Control
The web isn’t an API, but Python developers often treat it like one. This talk explores how to design Python interfaces for unstable data sources, focusing on schema evolution, defensive parsing, and protecting downstream users.
Saurav Jain
Saurav Jain, Apify's Developer Community Manager, excels in community building and devrel. With a history of growing Amplication's community to 40K, he now enhances Apify's developer engagement. An international speaker, he has contributed to more than 100 conferences from remote places in Africa to events in Silicon Valley. His work bridges developers globally, fostering innovation and collaboration within the tech ecosystem. His expertise and passion for technology make him a pivotal figure in nurturing tech communities.
Quantum Machine Learning with Qiskit
Unlock the power of Quantum Machine Learning (QML) with Python and Qiskit. You will explore the distinctions between classical and quantum machine learning and gain an understanding of data encoding, quantum kernels, and the training process of a Variational Quantum Classifier. You’ll also discover how Qiskit Functions integrate quantum into application workflows to solve complex challenges. Finally, you’ll explore IBM case studies showing the transition to practical "quantum utility."
Jonas Adomaitis
Jonas Adomaitis is a Senior Data Scientist and People Manager at IBM Lithuania. He holds a Bachelor of Science in Computer Science and Mathematics from The University of Edinburgh. Jonas is a highly active member of the global quantum community, serving as a member of the Qiskit Advocate program and the Co-lead of IBM’s Quantum Club. His technical expertise encompasses Agentic AI, NLP, and Explainable AI (XAI), with a focus on delivering advanced analytics for the financial and government sectors. Jonas’s contributions have been recognized with several distinctions, including the 2025 Quantum Excellence designation from the Qiskit Global Summer School and the IBM Quantum Challenge 2024 Achievement.
Dataset Updates Without Losing Your Mind
Many teams work with datasets that evolve over time. What starts as simple setup, quickly turns into chaos once updates become regular. In this talk, I share a practical workflow for managing dataset updates by splitting the process into clear stages, each represented by a Python script. This approach was used in production for two years on image datasets from 2,000 to 200,000 samples and helps small teams reduce cognitive load and keep dataset and model updates predictable.
Oleksii Liashuk
Oleksii Liashuk is lead ML engineer working with Python computer vision systems, such as object detection, segmentation, OCR and object tracking. He focuses on practical ML problems like car damage detection, container number recognition in difficult conditions, and continuous dataset and model maintenance. Oleksii has hands-on experience with dataset updates, model retraining cycles and deployment of ML systems using Docker and Kubernetes.
Making African Languages Visible: A Python-Based Guide to Low-Resource Language
This talk introduces how Python and FastText can be used to detect low-resource African languages using the MasakhaNER dataset. We cover key preprocessing steps, evaluation methods, and challenges such as dialectal variation and sparse data. The session also compares FastText with African-focused NLP tools like AfroXLMR and Masakhane Models, offering clear guidance on when each tool works best.
Gift Ojeabulu
I’ve spent the last 6+ years at the intersection of AI/ML, SWE, developer advocacy, and community building. Most recently, I worked as an AI devrel advocate and content lead at Iterative.ai, the team behind the popular open source AI tools DVC and CML. I’ve built and scaled thriving AI communities, notably as co-founder of D.C.A, now the largest Data and AI community of Black professionals worldwide. A visionary data scientist whose work is transforming Africa's technological landscape. As the Co-founder of Data Community Africa, an advisory board member at DevNetwork (Artificial Intelligence), and AI Developer Advocate, Gift has emerged as a pivotal figure in democratizing data and AI across the continent. My crowning achievement, the African Data Community Newsletter, has become a beacon of knowledge sharing, reaching an impressive network of over 2500 subscribers spanning 45 countries and 8 U.S. states. This initiative has inspired his involvement at DatafestAfrica with 4 Conferences and 5+ hackathons in less than 4 years, now one of the continent's premier data and AI conferences, bringing together practitioners, researchers, and enthusiasts from across the globe. In Lagos, Gift's leadership of the MLOps community has revolutionized how organizations approach machine learning operations. Under his guidance, the community has become a hub for innovation in practical MLOps and Large Language Models (LLMs), fostering collaboration between industry leaders and emerging talents. His emphasis on open-source AI development has created new pathways for African developers to contribute to global technological advancement. Through strategic initiatives and unwavering dedication, Gift Ojeabulu continues to architect the future of Africa's data and AI ecosystem. His work exemplifies how individual leadership can catalyze continental transformation, making advanced technology accessible to communities that have historically been underserved in the global tech landscape.
Data versioning
One of the core fundamental pieces of technology every software-related tech stack is heavily dependent on is Git. The ability to version code and control the flow of development is the only common focus for every software project. We take for granted that everyone in the working industry can indeed properly version code. In this talk, we’ll explore the meaning of data versioning and how we could borrow methodologies from the software engineering field to better manage our data.
Federico Marchesi
Hi, my name is Federico Marchesi. During my career, I have had the pleasure to work with a variety of different ML systems, ranging from complex OLAP systems, distributed Machine learning inference platforms, and I have also touched the rise of modern data lakehouses. I’m especially passionate about data, which I believe is the foundation of modern software, not just in ML. Outside of work, I enjoy staying active through MTB, swimming, and running. I’m also a passionate motorsport enthusiast.
Master the Art of Schema Dissection: Operation Data Engineer
Have you ever faced an enormous, wide table? Sometimes you need to cope with it because it's faster in this form, and that's what your stakeholders need. Anyway, it’s hiding entities, metrics, and time semantics. Learn how the dissection framework reveals schema structure and turns risky rewrites into surgical precision.
Antonino (Nino) Cangialosi
Hi there, I'm Nino! An Italian-Finnish theoretical physicist turned into a Software Engineer for a living. My relentless curiosity drives me to continuously explore new languages and frameworks. Over the past five years, I've thrived in roles ranging from DevOps and Full-Stack to Platform and Big Data Engineering. When I'm not diving into data, you can find me challenging gravity with bouldering, centering my mind through yoga, or strategizing my next chess move.
Beyond SHAP: Diagnosing Vector Embeddings with Visual Explainable AI
When your embedding-based classification model fails, should you collect more data or try a different approach? This talk shares a practical XAI workflow using UMAP visualization and prototype analysis to uncover systematic failures. We will explore how to use these tools to identify semantic overlaps and make evidence-based decisions when debugging high-dimensional similarity systems.
Valdas Druskinis
Valdas is a Machine Learning Engineer at carVertical, where he builds reliable and interpretable ML systems with a strong focus on end-to-end model and data lifecycles. Previously, he has lived and worked in multiple countries, including time at Mercedes-Benz AG, and has built production systems across classical ML, deep learning, and GenAI.
From Sports Stats to AI Safety: The Ranking Renaissance
In 1952, two statisticians proposed a simple model for comparing sports teams. In 1960, it became the foundation of chess ratings. Today, that same math powers everything from ChatGPT's alignment to Chatbot Arena leaderboards. But you don't need to be OpenAI to use it. I'll show how Bradley-Terry and preference-based ranking can solve everyday problems: comparing A/B test variants, ranking search results, evaluating ML models, prioritizing features, and more.
Gediminas Sadaunykas
AI Tech Lead and Data Scientist, with over 10 years of combined experience, in various domains including facilities management, medical information, sports and global consumer app business. Founder at AimRank, AI Innovation Labs, specializing in ranking systems.
Behind Every Instant Loan Is Data Science: How Python Scorecards Decide Credit Risk
Modern digital lending demands instant decisions, and behind those decisions is a Data Science workflow powered by scorecard. This talk explains how scorecards calculates credit risk in a transparent and scalable way, from feature engineering to production deployment. Using real examples from our company, models that enable fast, reliable loan approvals.
Zafarzhon Irismetov
Senior Data Scientist at loan lending business.