Many organizations have migrated their data warehouses to datalake solutions in recent years. With the convergence of the data warehouse and the data lake, a new data management paradigm has emerged that combines the best of 2 approaches: the botton-up of big data and the top-down of a classic data warehouse.
Nothing
In this talk, I will explain the current challenges of a datalake and how we can approach a moderm data architecture with the help of pyspark, hudi, delta.io or iceberg. We will see how organize data in a data lake to support real-time processing of applications and analyzes across all varieties of data sets, structured and unstructured, how provides the scale needed to support enterprise-wide digital transformation and creates one unique source of data for multiple audiences.
Mauro Pelucchi is Senior Data Scientist and Big Data Engineer responsible for the design of the “Real-Time Labour Market Information System on Skill Requirements” for CEDEFOP (European Centre for the Development of Vocational Training).
He currently works as Head of Global Data Science at Lightcast with the goal to develop innovative models, methods, and deployments of labour market data and other data to meet customer requirements and prototype new potential solutions.
His main tasks are related to advanced machine learning modelling, labour market analyses, and the design of big data pipelines to process large datasets of online job vacancies. In collaboration with the University of Milano-Bicocca, he took part in many research projects related to the labour market intelligence systems. He collaborates with the University of Milano-Bicocca as a Lecturer for the Masters of Business Intelligence and Big Data Analytics and with the University of Bergamo as a Lecturer in Computer Engineering.