Our mission is simple but profound: to improve and extend lives by learning from the experience of every person with cancer. This talk explains how we transform sensitive data from heterogeneous environments into research-grade datasets. And how we shift insights generation left to iterate faster.
We expect participants to have some experience with data warehouses or lakehouses, as well as familiarity with Python for data processing.
In this talk, we’ll begin by introducing the concept of real-world evidence datasets and their transformative impact on cancer research. We’ll explore the significant challenges of building high-quality real-world evidence datasets, including the fragmented healthcare data landscape, the complexity of source data, and stringent regulatory constraints.
To address these challenges, we’ll introduce our privacy-enhancing data architecture and foundational technology stack, which includes AWS, Snowflake, Python, DLT, DBT, Pandas, and DuckDB. We’ll explain how we leverage these tools to establish critical feedback loops and accelerate actionable insights. Key topics include:
One of our most impactful innovations was shifting data investigations to the left. Previously, raw data inspections were delayed until pre-processed data was available in our data warehouse, causing prolonged feedback loops and inefficiencies. Disorganized file formats often required data scientists to manually inspect data, limiting their ability to draw meaningful, cohort-wide conclusions.
To overcome these challenges, we established local databases directly on compute instances where raw data is stored, leveraging the flexibility and transparency of this approach. Transformations developed within this setup can seamlessly transfer to the cloud data warehouse while remaining portable and adaptable to various environments. This enabled our data scientists to explore and understand the data earlier in the process, eliminating bottlenecks. With immediate access to organized datasets, our tea
Florian is based in Berlin and works as Software Engineer for Flatiron Health. Before joining Flatiron Health’s mission to improve and extend lives by learning from the experience of every person with cancer, he worked for eBay and Immobilienscout24. Florian loves traveling with his family, uses his little son as excuse to buy toys for himself and is passionate about software engineering, software architecture and punk rock.