Stop Guessing: Build Feedback Loop for Prompt Engineering
Speaker
Tadas Goberis
Dentist turned engineer with 4 years in software and nearly 3 years in AI. Currently Lead AI Engineer at Trimble, where I build production agentic systems.
Abstract
Most teams iterate on prompts by eyeballing outputs and hoping for the best. This talk presents a different approach: connect your prompt to a measurable downstream outcome, build a test set, and let metrics drive improvements. I'll walk through a complete feedback loop - from prompt to retrieval metrics to automated error analysis to LLM-powered iteration - and show how this turns prompt engineering from guesswork into something you can actually measure and improve systematically.
Description
You change a prompt. You look at a few examples. You think "yeah, that's better" and ship it. Sound familiar?
This presentation shows a different way: treat prompts like code by adding tests. The core idea is simple - find the downstream behavior you actually care about (retrieval quality, classification accuracy, task success), make it measurable, and use that signal to drive prompt improvements.
I'll walk through a real system that does this: extracting problem statements from support tickets, measuring how well they retrieve similar tickets, analyzing what breaks, and using an LLM to suggest fixes based on actual failure patterns and structured context.
The pattern works for any prompt that feeds into a measurable system. You'll leave knowing exactly how to apply it: identify your proxy metric, build your test set, and stop guessing whether your prompts are getting better.