April 9, 2026
Explore the world of data with Python. From data engineering and pipelines to analytics, visualization, and machine learning — learn how to work with data at scale and derive insights that matter.
Keynote Speaker

Katharine Jarmul
Founder, Kjamistan
Katharine Jarmul focuses her work and research on privacy and security in data science, deep learning and AI. She is author of the well received book Practical Data Privacy (O'Reilly 2023) and has more than 10 years experience in machine learning/AI where she has helped build large scale AI systems with privacy and security built in. You can follow her work via her newsletter, Probably Private (https://probablyprivate.com) or on her website at kjamistan.com.
What to Expect
Talks and workshops happening on Data Day
Exposing Greenwashing: Satellite ML for Carbon Credit Verification
The carbon market is set to reach 1T dollars by 2030, yet 84% of offsets fail to deliver real climate benefits. Verification still relies on sparse site visits and self-reported data. This poster shows a Python workflow that audits carbon projects using satellite imagery and ML, detecting over-crediting and leakage in REDD+ sites. With open data and open-source tools, anyone can compare claimed versus observed forest outcomes and verify what projects actually deliver.
Neeraj Pandey
Neeraj is the co-founder of Vivid Climate, a climate management and DMRV platform. Neeraj is a polyglot. Over the years, he has worked on a variety of full-stack software and data-science applications, as well as computational arts, and likes the challenge of creating new tools and applications, and is an active international speaker with talks and tutorials presented at multiple conferences.
Making African Languages Visible: A Python-Based Guide to Low-Resource Language
This talk introduces how Python and FastText can be used to detect low-resource African languages using the MasakhaNER dataset. We cover key preprocessing steps, evaluation methods, and challenges such as dialectal variation and sparse data. The session also compares FastText with African-focused NLP tools like AfroXLMR and Masakhane Models, offering clear guidance on when each tool works best.
Gift Ojeabulu
I’ve spent the last 6+ years at the intersection of AI/ML, SWE, developer advocacy, and community building. Most recently, I worked as an AI devrel advocate and content lead at Iterative.ai, the team behind the popular open source AI tools DVC and CML. I’ve built and scaled thriving AI communities, notably as co-founder of D.C.A, now the largest Data and AI community of Black professionals worldwide. A visionary data scientist whose work is transforming Africa's technological landscape. As the Co-founder of Data Community Africa, an advisory board member at DevNetwork (Artificial Intelligence), and AI Developer Advocate, Gift has emerged as a pivotal figure in democratizing data and AI across the continent. My crowning achievement, the African Data Community Newsletter, has become a beacon of knowledge sharing, reaching an impressive network of over 2500 subscribers spanning 45 countries and 8 U.S. states. This initiative has inspired his involvement at DatafestAfrica with 4 Conferences and 5+ hackathons in less than 4 years, now one of the continent's premier data and AI conferences, bringing together practitioners, researchers, and enthusiasts from across the globe. In Lagos, Gift's leadership of the MLOps community has revolutionized how organizations approach machine learning operations. Under his guidance, the community has become a hub for innovation in practical MLOps and Large Language Models (LLMs), fostering collaboration between industry leaders and emerging talents. His emphasis on open-source AI development has created new pathways for African developers to contribute to global technological advancement. Through strategic initiatives and unwavering dedication, Gift Ojeabulu continues to architect the future of Africa's data and AI ecosystem. His work exemplifies how individual leadership can catalyze continental transformation, making advanced technology accessible to communities that have historically been underserved in the global tech landscape.
Stats Meets ML - What I learned from my Machine Learning Certification
Statisticians and machine learning specialists have a lot to learn from each other (even if they don't think so). This talk lightheartedly awards points to both classical statistics and machine learning, with an attempt not to offend anyone (but to annoy everyone). Topics include: Are confidence intervals worth it? What is bias, anyway? Can I just code it in Python?
James Donahue
Raised in Nashville, Tennessee, USA, I have bounced around the world, finally settling in Hamburg, Germany. My professional background reflects my vagabond nature. I taught English in Asia, Latin America, and online (full-time remote work before Covid!), worked in adventure tourism in Appalachia (USA) and Chile, and did my masters in Economics in Hamburg. Somewhere along the way, I gathered a few stories and soft skills. I met Python during my masters, but stuck with Matlab through the my PhD coursework, until I decided that academia is not my forever home. Currently I am occupying myself with ensemble methods such as XGBoost, as well as expanding into computer vision with PyTorch and toying with more advanced data visualizations. Naturally, NumPy will always hold a special place in my heart, along with the Cython package to sample everything a Bayesian needs.
Python for Data Quality in 2025: Why tests alone are no longer enough
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Beyond the Static 2D Plot - Spatial Data Storytelling in 4D
2D static plots are great, but they are static. Data isn't - it changes. So we turn a plot into an animation. But we don't live in planes - we live in space. And we want to send a message, not just show an animation. This leads us to the 3D animated story! In this talk I will close the gap between abstract data and its physical reality. Through step-by-step examples using (Geo)Pandas, (I)Pydeck, PyVista, Blender etc., I will turn basic charts into 4D stories with custom models added to geospace.
Kęstutis Gadeikis
Kestutis Gadeikis (PhD, EMBA) is the Chief Actuary and Data Governance Manager at Lietuvos Draudimas (PZU Group), the insurance market leader in the Baltics. With a PhD in Mathematics and an EMBA, he specializes in finding the right balance between the data analysis and the impactful presentation of its results.
Python, rust and arrow for data processing
Python struggles with heavy data loads. Rust offers speed, and PyO3 makes bridging the two seamless. This talk shows how to build a shared Rust core to avoid code duplication. I will also cover using Apache Arrow for zero-copy data sharing and removing serialization costs entirely. Discover how this stack enables high-performance data processing in Python and Pyspark
Paulius Venclovas
Paulius Venclovas is a Senior Data Engineer at "Flo Health" where he focus on delivering data processing and governance solutions on Databricks platform. Previously he worked at financial startup "Curve" and data consultancy company "Beyond Analysis". Paulius also holds a Master’s degree in Computing (AI/ML) from Imperial College London.
Creative Data Storytelling with Python
Python enables data professionals to move beyond analysis and transform information into clear, compelling stories. With various libraries, Python supports insightful exploration, expressive visualizations, and interactive elements that enhance communication. This talk highlights practical techniques for turning patterns, trends, and insights into engaging narratives, making data more understandable, impactful, and actionable.
Purva
Purva Porwal is an AI/ML Manager at State Street Corp with decade of experience in the tech industry. Primarily contributed in Conversational AI and Natural Language Processing, she has driven impactful innovations across key industries such as finance and telecommunications. Her passion lies in harnessing AI’s potential to create meaningful transformation, and is constantly exploring emerging technologies in the field. Beyond her technical expertise, Purva is an active mentor, supporting and inspiring future professionals in the AI space. She plays an integral role in the AI community, contributing to thought leadership and fostering collaborative progress.
Airflow Lessons They Don't Put in the Docs
Airflow basics are well documented. Production Airflow is not. This talk covers the patterns, costs, and migration pitfalls that only show up after you've deployed: dynamic DAGs that scale, sensors that don't waste resources, CloudWatch bills that surprise you, and MWAA version upgrades that break in ways the changelog didn't mention. Practical lessons for teams running Airflow beyond the tutorial stage.
Tomas Peluritis
Tomas leads data at Mediatech and runs Uncle Data, a newsletter and podcast for data engineers who prefer practical advice over hype. By day, he manages pipelines processing half a billion events; by night, he writes about what he learned (often the hard way). When not wrangling DAGs or mentoring his team, he's probably optimising a Magic: The Gathering deck.
Designing Python APIs for Data You Don’t Control
The web isn’t an API, but Python developers often treat it like one. This talk explores how to design Python interfaces for unstable data sources, focusing on schema evolution, defensive parsing, and protecting downstream users.
Saurav Jain
Saurav Jain, Apify's Developer Community Manager, excels in community building and devrel. With a history of growing Amplication's community to 40K, he now enhances Apify's developer engagement. An international speaker, he has contributed to more than 100 conferences from remote places in Africa to events in Silicon Valley. His work bridges developers globally, fostering innovation and collaboration within the tech ecosystem. His expertise and passion for technology make him a pivotal figure in nurturing tech communities.
Conformal Prediction for Time Series: Uncertainty Quantification for Trustworthy Systems
How can we **quantify uncertainty in time series** forecasts, without unrealistic assumptions, and with rock-solid guarantees? This talk introduces **Conformal Prediction** (CP), a framework to generate prediction intervals with guaranteed coverage. Whether you're forecasting energy demand, markets volatility, or space weather disturbances, CP helps you move **from point forecasts to reliable intervals** — even in non-stationary settings.
Vincenzo Ventriglia
A results-driven data professional, focused on hype-free solutions tailored to business needs. I currently create value at the **National Institute of Geophysics and Volcanology**, where I develop machine learning models in the **Space Weather** domain. My work is complemented by finding the hidden stories in data and make them accessible to stakeholders. I studied Physics in Italy (Napoli) and Germany (Frankfurt am Main), previously worked in Analytics within the strategic division of the world's largest professional services network, as well as in the Data Science department of Italy’s leading publishing group. I am also an organiser of **PyData Roma Capitale**, actively involved in building the local Python and data science community. Outside of work, I enjoy theatre, discussing finance, and learning new languages.