The batch mechanism is challenging when handling continuous data migration with DataProc. However, I'm introducing a new approach for continuous data pipelines enabled by PySpark. The participants will learn new methods to handle data consistency and reserve data completeness in a million-scale migration from SQL database into NoSQL, MongoDB.
Python in beginner level, Cloud usage experience in beginner level.
In this talk, I'll present the challenging journey in the real world from my real-world use cases to migrate millions of rows of data from SQL database into NoSQL, MongoDB.
The talks composes of:
I've been experiencing database stuffs for 20 years. Currently, I'm senior consulting engineer at MongoDB Singapore and PyCon Thailand organizing team. I've talked in several conference before such as Global AI Conference 2023, Javascript Bangkok 2.0.0. I also run MongoDB User Group Thailand which gathers 3000 of developers in community.