XAI - Explainable AI tools and techniques
Speaker
Viraj Sharma
I am a passionate technologist with a strong interest in Python, artificial intelligence and Edge computing. I am currently studying in Class 9 at Presidium School, Delhi, INDIA. I have worked in areas such as torch.nn visualization, Anthropic technologies (MCP, Skills), large concept models, and TensorCore/CUDA benchmarks, Edge AI on raspbrry pi running small models with sensors. Recently I have been working on XAI (AI explaianability) and putting my work as a project on my AI Lab - modelrecon.com. As an active member of the Python community and AI communities, I enjoy learning from experienced developers and sharing my insights with others. I attend major tech events including PyCons, Linux Fests, OS Summits, GDG events, p99conf, and various AI conferences, where I actively present my projects and ideas.
Abstract
In this talk, I propose to discuss the problem of building explainable AI with the two approaches - causal vs correlational. I will talk about what mech interp in LLMs. As a way to understand how models answer questions by looking inside them and checking which neurons activate when. I will discuss Anthropic's open sourced a python module - circuit-tracer, the Neuronpedia portal , will also talk about my own work with "activation cube" data structure (this is not a standard - I came up with it)
Description
basic problem of causal vs. correlational techniques and the limitations of corrlational.
Why Explainability Matters 2-3 mins
We need to understand why AI models make certain choices, not just what answers they give. Without this, the model feels like a black box. - in this I will include example of human behavior
What Transformers Hide 2-3 mins
I will talk about basic transformers internal steps and features that are hard to see. highliting that tools only show the final output, not the thinking process. Infact - I will highlight that it comes as a surpirse to normal people that we dont know how models "actually" arrive at specific answers. -
How Circuit Tracer Helps 3 - 5 mins
I will talk about how Anthropic’s Circuit Tracer shows the inside connections of the model. It turns hidden activations into easy-to-understand features and shows how they link together. It not that easy, but we can get used to the graphs (like the link guy in the matrix movie- he could just understand by looking at the matrix runtime code) - I will show some graphs and walk through the reasoning path on colab
Seeing the Reasoning Path 10 minutes or more
The tool draws a clear path from input → inner features → final output This lets everyone see which parts of the model caused the answer. _ this would be fun as the type of path a model takes are weird sometimes.
Why This Is Important 2 minutes
With this method, we can:
- check if the model is behaving safely
- fix mistakes inside the model
- build trust by seeing how it thinks
I will finish with description with some of my work