Models·Anthropic·Feb 2026

★45. Introducing Claude Opus 4.6

Agent teams, 1M-token context, and GDPval-AA dominance

Product Announcement

Summary

Anthropic released Claude Opus 4.6, its most capable model, featuring 'agent teams' (multiple agents splitting larger tasks into segmented jobs), 1M-token context window in beta, and leading performance on Terminal-Bench 2.0 (agentic coding), Humanity's Last Exam (multidisciplinary reasoning), and GDPval-AA (economically valuable knowledge work — outperforming GPT-5.2 by 144 Elo). Pricing held at $5/$25 per million tokens.

Key Concepts

Agent teams: multiple Claude instances collaborating on segmented tasks

The headline feature of Opus 4.6 is 'agent teams' — orchestrated groups of Claude agents that split larger tasks into parallel subtasks, each handled by a specialized agent instance. This moves beyond single-agent workflows to multi-agent coordination, enabling complex projects like codebase-wide refactors, multi-document analysis, and end-to-end research workflows.

1M-token context window expands long-horizon agentic work

Opus 4.6 doubles the context window from 200K to 1M tokens (in beta), matching Gemini's context length. For agentic workflows, context length is a critical bottleneck — longer context means agents can maintain state across more complex, multi-step tasks without losing track of earlier information.

GDPval-AA: 144 Elo points above GPT-5.2 on economically valuable work

On GDPval-AA — an evaluation of performance on real-world knowledge work tasks in finance, legal, and other domains — Opus 4.6 outperformed GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. This is a meaningful gap on tasks that directly correspond to professional economic value.

Terminal-Bench 2.0 and Humanity's Last Exam leadership

Opus 4.6 achieved the highest score on Terminal-Bench 2.0 (an agentic coding evaluation) and led all frontier models on Humanity's Last Exam (a complex multidisciplinary reasoning test designed to be unsolvable by AI). These benchmarks validate capabilities across both applied engineering and abstract reasoning.

Connections

Influenced by

28. Claude 4 Family Launch (Opus 4 & Sonnet 4)

May 2025