Room: Room 111
April 5
14:00–14:25
If you are GPU-poor you need to become data-rich. I will give an overview of what we learned from looking at Alpaca, LIMA, Dolly, UltraFeedback and Zephyr and how we applied that to fine-tuning a state-of-the-art open source LLM called Notus and Notux by becoming data-rich.
General knowledge about LLMs and NLP
GPUs are in high demand and low supply but being GPU-poor can be solved by focusing on data quality and becoming data-rich. By looking at efforts like Alpaca, LIMA, Dolly, UltraFeedback and Zephyr, we can see again and again that data quality is often a thing that does not get the attention it deserves.
Gabriel is a Machine Learning Engineer focused on NLP. From academia to industry, he is now working on Argilla, where we have contributed to the backend of Argilla and also in the development and design of distilabel, a library for generating synthetic data using LLMs.
Hi there 👋
From failing to study medicine ➡️ BSc industrial engineer ➡️ MSc computer scientist. Life can be strange, so better enjoy it. I´m sure I do by: 👨🏽🍳 Cooking, 👨🏽💻 Coding, 🏆 Committing.