Debug Data
Talk Data:
{
"code": "BVXZCS",
"title": "Python for Data Quality in 2025: Why tests alone are no longer enough",
"speakers": [
{
"code": "T7SC8Z",
"name": "Artsem",
"biography": "Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount",
"submissions": [],
"avatar_url": "https://pretalx.com/media/avatars/T7SC8Z_KjvWU8f.webp",
"answers": [
490354,
490355,
490356
],
"email": "artemvarivoda@gmail.com",
"timezone": "Europe/Vilnius",
"locale": "en",
"has_arrived": false,
"availabilities": [
{
"start": "2026-04-09T10:00:00+03:00",
"end": "2026-04-09T11:30:00+03:00",
"allDay": false
},
{
"start": "2026-04-10T10:00:00+03:00",
"end": "2026-04-10T11:30:00+03:00",
"allDay": false
}
],
"internal_notes": ""
}
],
"submission_type": {
"id": 6673,
"name": {
"en": "Talk"
},
"default_duration": 25,
"deadline": null,
"requires_access_code": false
},
"track": {
"id": 6399,
"name": {
"en": "Data Day - Apr 9"
},
"description": {},
"color": "#3776aa",
"position": 2,
"requires_access_code": false
},
"tags": [],
"state": "confirmed",
"abstract": "In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.",
"description": "As data pipelines become distributed and schemas evolve rapidly, traditional checks (NOT NULL, uniqueness, value ranges ) can not guarantee data quality anymore. Tests may pass, yet data can be stale, anomalous, or misaligned with business needs - and it will be discovered only on PowerBI report. And if project will use streaming rather than batching data - problems will grow up as a snowball.\r\n\r\nThis talk demonstrates how to build robust, production-ready Data Quality systems with Python using following practices:\r\n1 - Real-time freshness and completeness checks embedded in pipelines;\r\n2 - Volume anomaly detection and distribution drift analysis;\r\n3 - Versioned data contracts and schema evolution as Python objects;\r\n4 - Seamless integration with orchestrators (Airflow, Dagster) or another systems like Databricks or Stargate for automated incident response.\r\nYou will see real engineering patterns, code snippets, and actionable best practices ready for immediate use. The focus is on building resilient pipelines where Python glues together monitoring, validation, and orchestration. No marketing, no AI buzzwords—just engineering and possible practical solutions for modern data teams.",
"duration": 25,
"slot_count": 1,
"content_locale": "en",
"do_not_record": false,
"image": "https://pretalx.com/media/pycon-lithuania-2026/submissions/BVXZCS/av_title_slide_qFvSnD6.webp",
"resources": [],
"slots": [],
"answers": [
490353
],
"pending_state": null,
"is_featured": false,
"notes": "Target audience - Python and Data engineers closely working with data. \r\nTalk will be divided into overview of DQ/problem statement, then what changed in 2025 compared to previous years, jumping after to practical solutions - focusing on engineering experience excellence, but without super deep dive to not make audience asleep. Personally I am a little bit skeptical about AI usage everywhere, so focus will be on pure engineering and I hope this talk will be beneficial for the PyCon audience. Since data is currently everywhere, and it is hard to build great data product if we have garbage in. Because after we will have garbage out :)",
"internal_notes": null,
"invitation_token": "RTVACP3RKNRARRWFK7FGMXXDWPRXEMDB",
"access_code": null,
"review_code": "3CH7J3DNK7URYZC73LYZPAFSA8MMNWSN",
"anonymised_data": "{}",
"reviews": [],
"assigned_reviewers": [],
"is_anonymised": false,
"median_score": null,
"mean_score": null,
"created": "2026-01-09T15:11:09.036201+02:00",
"updated": "2026-01-09T15:11:19.239923+02:00",
"invitations": []
}Parsed Values:
{
"speakers": [
{
"code": "T7SC8Z",
"name": "Artsem",
"biography": "Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount",
"submissions": [],
"avatar_url": "https://pretalx.com/media/avatars/T7SC8Z_KjvWU8f.webp",
"answers": [
490354,
490355,
490356
],
"email": "artemvarivoda@gmail.com",
"timezone": "Europe/Vilnius",
"locale": "en",
"has_arrived": false,
"availabilities": [
{
"start": "2026-04-09T10:00:00+03:00",
"end": "2026-04-09T11:30:00+03:00",
"allDay": false
},
{
"start": "2026-04-10T10:00:00+03:00",
"end": "2026-04-10T11:30:00+03:00",
"allDay": false
}
],
"internal_notes": ""
}
],
"track": {
"id": 6399,
"name": {
"en": "Data Day - Apr 9"
},
"description": {},
"color": "#3776aa",
"position": 2,
"requires_access_code": false
},
"submissionType": {
"id": 6673,
"name": {
"en": "Talk"
},
"default_duration": 25,
"deadline": null,
"requires_access_code": false
}
}Python for Data Quality in 2025: Why tests alone are no longer enough
Speaker
Artsem
Data Engineer with passion to Data Quality. Experience in Azure, AWS, Python, Databricks. Head of DQ department in EPAM Lithuania & Latvia with 20 people headcount
Abstract
In 2025, classic data tests via Python are not enough. During 25-30 minutes talk I will show how Python powers modern Data Quality: from real-time freshness checks to anomaly detection and orchestrator integration. No AI hype: starting with quick Data Quality overview and problem statement I will show practical code, architecture, and hands-on engineering for resilient pipelines from Data Engineer/Data Quality Engineer perspective.
Description
As data pipelines become distributed and schemas evolve rapidly, traditional checks (NOT NULL, uniqueness, value ranges ) can not guarantee data quality anymore. Tests may pass, yet data can be stale, anomalous, or misaligned with business needs - and it will be discovered only on PowerBI report. And if project will use streaming rather than batching data - problems will grow up as a snowball.
This talk demonstrates how to build robust, production-ready Data Quality systems with Python using following practices: 1 - Real-time freshness and completeness checks embedded in pipelines; 2 - Volume anomaly detection and distribution drift analysis; 3 - Versioned data contracts and schema evolution as Python objects; 4 - Seamless integration with orchestrators (Airflow, Dagster) or another systems like Databricks or Stargate for automated incident response. You will see real engineering patterns, code snippets, and actionable best practices ready for immediate use. The focus is on building resilient pipelines where Python glues together monitoring, validation, and orchestration. No marketing, no AI buzzwords—just engineering and possible practical solutions for modern data teams.