Python for Data Quality in 2025: Why tests alone are no longer enough
Speaker
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Abstract
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Description
As data pipelines become distributed and schemas evolve rapidly, traditional checks (NOT NULL, uniqueness, value ranges ) can not guarantee data quality anymore. Tests may pass, yet data can be stale, anomalous, or misaligned with business needs - and it will be discovered only on PowerBI report. And if project will use streaming rather than batching data - problems will grow up as a snowball.
This talk demonstrates how to build robust, production-ready Data Quality systems with Python using following practices: 1 - Real-time freshness and completeness checks embedded in pipelines; 2 - Volume anomaly detection and distribution drift analysis; 3 - Versioned data contracts and schema evolution as Python objects; 4 - Seamless integration with orchestrators (Airflow, Dagster) or another systems like Databricks or Stargate for automated incident response. You will see real engineering patterns, code snippets, and actionable best practices ready for immediate use. The focus is on building resilient pipelines where Python glues together monitoring, validation, and orchestration. No marketing, no AI buzzwords—just engineering and possible practical solutions for modern data teams.