Feedback Loops for Search: Rerankers, Evals & Retraining
Speaker
Paulius Dovidaitis
Software engineer and co-founder/CTO at Wayless.ai, founder of AI consultancy Lanepeak.com, working on ML-powered retrieval and ranking systems, with experience applying machine learning in the cybersecurity domain. Outside of work: FTC robotics mentor.
Abstract
Retrieval pipelines can improve from user feedback-but connecting signals like likes, dwell time, and clicks back to model retraining is tricky. This talk walks through an architecture for doing it: hybrid candidate generation (BM25 + embeddings), neural reranking, feedback collection, building evaluation sets, and the retraining loop. Concrete examples from production, with trade-offs between different approaches.
Description
Retrieval pipelines typically have three stages: generate candidates, rerank them, return results. This talk focuses on what comes after - collecting feedback, building evals, and closing the loop back to training.
I'll share the architecture I built and iterated on:
Candidate generation Combining BM25, embedding similarity, and heuristic filters. When embeddings win, when keyword matching still beats them, and the latency/accuracy trade-offs I hit.
Reranking Neural rerankers vs. lightweight scoring. What features actually moved the needle. How I structured training data from user interactions.
Feedback signals Signals like likes/dislikes, dwell time, copy events, and user clicks. Which ones correlate with relevance, which are noise, and how to tell the difference.
Building evaluation sets The hardest part. How to construct test sets when you don't have labeled data. Which metrics actually track with production performance.
The retraining loop Model versioning, A/B testing changes, and rolling back when metrics drop.