Good retrieval performance is key to an effective RAG system, as it ensures relevant information is selected, directly impacting augmentation and generation quality. My presentation focuses on RAG indexing and retrieval, exploring methods to convert text into searchable formats, comparing techniques, and analyzing their advantages, disadvantages, and performance on an annotated dataset to enhance document retrieval based on user queries.
Simply knowing that any information system needs to first index and then retrieve documents should be enough.
Retrieval Augmented Generation (RAG) is a model architecture for tasks requiring information retrieval from large corpora combined with generative models to fulfill a user information need. It's typically used for question-answering, fact-checking, summarization, and information discovery.
The RAG process consists of indexing, which converts textual data into searchable formats; retrieval, which selects relevant documents for a query using different methods; and augmentation, which feeds retrieved information and the user's query into a Large Language Model (LLM) via a prompt for output generation.
Typically, one has little control over the augmentation step besides what's provided to the LLM via the prompt and a few parameters, like the maximum length of the generated text or the temperature of the sampling process. On the other hand, the indexing and retrieval steps are more flexible and can be customized to the specific needs of the task or the data.
My talk will focus on RAG systems' indexing and retrieval techniques. Attendees will learn about various methods, starting with classical approaches rooted in the Information Retrieval community. While these methods have been around for decades, they remain widely used today due to their simplicity and efficiency.
The session will then explore more modern techniques that leverage LLMs to enhance the indexing process or optimize user queries. These approaches aim to improve the retrieval of relevant documents and improve the performance of RAG systems.
Participants will gain insights into each technique's unique features, advantages, and limitations, along with guidance on selecting the most appropriate approach for specific tasks and datasets.
The talk will conclude with a performance analysis, comparing the implementation of all these techniques in Python using Haystack and evaluating them over an annotated dataset. Speed, accuracy, and efficiency will be assessed, offering an understanding of trade-offs and practical takeaways.
I’m an experienced machine learning engineer and software developer, with a strong background in Natural Language Processing. I hold a Ph.D from 2016 where I focused on semantic relationship extraction. Currently, I'm based in Berlin and I work as a NLP Engineer and Software Developer at deepset, where I contribute to the development of Haystack, an open-source framework for building end-to-end production-ready LLM-based applications