Agent teams, 1M-token context, and GDPval-AA dominance
Product AnnouncementAnthropic released Claude Opus 4.6, its most capable model, featuring 'agent teams' (multiple agents splitting larger tasks into segmented jobs), 1M-token context window in beta, and leading performance on Terminal-Bench 2.0 (agentic coding), Humanity's Last Exam (multidisciplinary reasoning), and GDPval-AA (economically valuable knowledge work — outperforming GPT-5.2 by 144 Elo). Pricing held at $5/$25 per million tokens.
The headline feature of Opus 4.6 is 'agent teams' — orchestrated groups of Claude agents that split larger tasks into parallel subtasks, each handled by a specialized agent instance. This moves beyond single-agent workflows to multi-agent coordination, enabling complex projects like codebase-wide refactors, multi-document analysis, and end-to-end research workflows.
Opus 4.6 doubles the context window from 200K to 1M tokens (in beta), matching Gemini's context length. For agentic workflows, context length is a critical bottleneck — longer context means agents can maintain state across more complex, multi-step tasks without losing track of earlier information.
On GDPval-AA — an evaluation of performance on real-world knowledge work tasks in finance, legal, and other domains — Opus 4.6 outperformed GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. This is a meaningful gap on tasks that directly correspond to professional economic value.
Opus 4.6 achieved the highest score on Terminal-Bench 2.0 (an agentic coding evaluation) and led all frontier models on Humanity's Last Exam (a complex multidisciplinary reasoning test designed to be unsolvable by AI). These benchmarks validate capabilities across both applied engineering and abstract reasoning.