Room: Room 203
April 5
11:30–11:55
Data is new oil, and one of the ways is leakage and poisoning the surrounding environment. What happens if you pollute one of the datasets used in some decision makers facing dashboards? In this talk, I will explain the reemergence of the Write-Audit-Publish pattern and how you can achieve it using Apache Iceberg and Apache Spark.
General knowledge, Data Processing
One of the old patterns that is being adopted again is Write-Audit-Publish. Most Data Practitioners are testing the data they create after it’s done, potentially ruining other data assets downstream. Using WAP allows us to ensure that no such thing happens. In this talk, I will present how evolving new tech (in this case, Apache Iceberg table format) allows us to use Apache Spark to use this pattern. The presentation will cover topics like:
I'm a Data Engineer with a diverse background, transitioning from a Data Analyst to a Team Lead and Head of Data before returning to my roots. I have a knack for numbers and a passion for coding, constantly seeking optimal solutions and driving continuous improvement.
With expertise in data pipelines, orchestration, SQL, and strong communication skills, I excel in leading and mentoring teams. I've been fortunate to contribute to multiple data migrations and projects, including building some from scratch.
Outside of work, I thrive in fast-paced environments, embracing new challenges and staying updated with the latest technologies through side projects. I share my knowledge with the community through my podcast and blog, 'Uncle Data,' where I discuss all things data-related.