Reading the Mind of an LLM

Friday, April 10, 2026

08:00 AM - 08:25 AM

Great hall, (3rd building)

Speakers

Luca Baggi

AI engineer @xtream and open source contributor

Gabriele Orlandi

AI scientist at xtream

Abstract

What if you could watch an AI’s thought take shape? For years, LLMs have been impenetrable "black boxes," but we are finally beginning to find ways to see how the ghost in the machine actually works.

This talk explores mechanistic interpretability, a subfield of AI that aims to understand the internal workings of neural networks. Mapping these internal "circuits" is not only just a philosophical curiosity - or duty: it is a high-stakes engineering necessity for safety, debugging, and trust.

Description

What if we could step inside an LLM and watch it think in real time?

This talk distills the latest research from Anthropic, DeepMind, and OpenAI to present the current state of the art in LLM interpretability.

We’ll start with the modern interpretation of embeddings as sparse, monosemantic features living in high-dimensional space. From there, we’ll explore emerging techniques such as circuit tracing and attribution graphs, and see how researchers reconstruct the computational pathways behind behaviors like multilingual reasoning, refusals, and hallucinations.

We’ll also look at new evidence suggesting that models may have limited forms of introspection—clarifying what they can, and crucially cannot, reliably report about their internal processes.

Finally, we’ll connect these “microscopic” insights to real engineering practice: how feature-level understanding can improve debugging, safety, and robustness in deployed AI systems, and where current methods still fall short.