Engineering Complex AI solutions: Observability and Testing of multi-Agent Solutions

Friday, April 10, 2026

11:30 AM - 11:45 AM

Great hall, (3rd building)

Speaker

Dmitri

Lead systems engineer with more than 20 years experience in IT:

3 years in AI-based solutions development;
8 years of S-SDLC methodology implementation;
7 years of security architecture development;
4 years Experience in the development of the scalabale solution for the non-functional testing using private clouds AWS based APIs
solid knowledge of building secure and resilient architectures in multi-cloud environments
9 years of experience in consulting services for the external and internal EPAM accounts
6 years of experience in project and team management
8 years of experience in solutions architectures development and deployment
10 years security concepts and tests development including regular security audits
8 years of experience of developing and deploying Continuous Delivery and Continuous Integration concepts
Experience in processes development
10 years developing and deploying Unix/Linux based infrastructures

Abstract

As AI agents evolve from simple chatbots to complex multi-agent systems utilizing Model Context Protocol (MCP), manual validation becomes impossible. During this talk, I will demonstrate the process of architecting a quality assurance loop for these solutions. No theoretical fluff: I will focus on the practical analysis of automated pipeline results, interpreting Langfuse reports for cost and performance, and ensuring reliability from an AI Architect/System Engineer perspective.

Description

Deploying agentic AI solutions is no longer just about prompt engineering; it involves orchestrating complex interactions between LLMs, multiple Agents, and MCP servers. When an agent fails, did it hallucinate, fail to call a tool, or receive bad data? Traditional manual verification cannot scale to debug these multi-step workflows, and production regressions can lead to runaway costs or dangerous data mishandling.

This talk demonstrates how to establish a rigorous testing and observability strategy for complex AI solutions using Langfuse, covering the following practices:

1 - Automated Quality Analysis: Methodologies for scoring accuracy and relevance in multi-turn conversations using the LLM-as-a-Judge concept; 2 - Validating Tool Usage: Analyzing traces to ensure MCP servers and external tools are invoked correctly; 3 - Cost & Performance Auditing: Using granular reports to detect latency spikes and token usage anomalies before they hit production; 4 - Root Cause Analysis: Interpreting execution traces to pinpoint exactly where logic broke down in the reasoning chain.

You will see architectural patterns, real-world examples of pipeline execution reports, and deep-dive analytics via Langfuse. The focus is on the engineering of the CI/CD process and methodology required to maintain stable, high-performing AI solutions in an enterprise environment.