Mapped full input-to-output computational pathways in Claude 3.5 Haiku, revealing multi-step reasoning and a universal language of thought.
Research PaperTwo companion papers. Circuit Tracing linked interpretable features into computational circuits, revealing input-to-output pathways. 'On the Biology of a Large Language Model' applied this to Claude 3.5 Haiku, discovering multi-step reasoning, poetry planning, and a universal 'language of thought' shared across languages. Open-sourced the tools.
Computational maps showing how features (interpretable model components) connect and influence each other throughout the model's layers. Rather than treating features in isolation, attribution graphs reveal the circuits: how a specific feature in one layer connects to dependent features in later layers, forming the model's reasoning pathways.
Identifiable computational structures where the model chains intermediate concepts together to solve multi-step problems. The model internally represents concepts (like "Texas") that never appear in outputs, then uses these representations in subsequent layers to reach the final answer, demonstrating genuine reasoning beyond surface-level pattern matching.
The discovery that Claude plans ahead when writing poetry, identifying rhyming words before composing a line and then constructing the line specifically to reach that word. This contradicts the narrative that language models "just predict the next token" — instead showing genuine forward planning and goal-directed behavior.
Evidence that the model maintains a universal "language of thought" that operates independently of the language being used. When translating concepts across languages, the model routes through a shared conceptual space rather than translating word-by-word, suggesting deeper understanding than surface-level language processing.
The research team released computational tools enabling other researchers to trace circuits in their own models. This democratizes mechanistic interpretability, moving it from proprietary lab work to a community capability, accelerating progress across the field and validating the approach's importance.