Hear more about the evolving landscape of SQL transformation tools and data lineage challenges. Explore how sqlglot enables powerful SQL parsing and transformation capabilities, and see practical demonstrations of sqlmesh as a modern alternative to dbt. Learn about open-source approaches to data lineage tracking and discover how these tools are shaping the future of data engineering workflows.
SQL, Python
This talk explores the intersection of SQL transformation frameworks and data lineage tracking, focusing on open-source solutions that are changing how we handle data transformations at scale. We'll begin by examining common pain points in data lineage tracking, particularly when dealing with complex SQL transformations across different dialects and platforms.
The first part will deep dive into sqlglot's architecture and demonstrate how it serves as a crucial building block for modern data tools by enabling dialect-agnostic SQL parsing, analysis, and transformation. We'll explore real-world use cases where sqlglot's capabilities unlock new possibilities for data lineage tracking and SQL optimization.
Next, we'll contrast sqlmesh with dbt, highlighting key architectural differences and their implications for data engineering workflows. Through live demonstrations, we'll showcase sqlmesh's unique features including time-travel capabilities, automated dependency management, and built-in data lineage tracking. We'll also address how these tools approach column-level lineage tracking, comparing open-source alternatives to proprietary solutions like dbt Cloud.
The session will conclude with practical guidelines for implementing these tools in your data stack and a discussion of future trends in SQL transformation and lineage tracking. Whether you're a data engineer looking to optimize your workflow or an architect evaluating data transformation frameworks, you'll leave with actionable insights about modern SQL tooling.
A data practitioner with over a decade of experience in building and optimizing data solutions, specializing in data pipeline development, SQL optimization, and data infrastructure architecture. Working extensively with technologies like Snowflake, dbt, Airflow, and various SQL engines, with hands-on experience in multiple large-scale data migrations and greenfield projects.
Passionate about open-source technologies and efficient data architectures, actively contributing to the data community through the 'Uncle Data' podcast and blog, where he explores technical challenges and solutions in data engineering. His work focuses on developing scalable data solutions, implementing data quality frameworks, and designing robust data architectures.
A returning speaker at PyCon Lithuania (2023, 2024), sharing practical insights about data engineering and Python applications in the data world. When not working with data, he experiments with upcoming technologies through side projects and advocates for knowledge sharing within the tech community.