SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Models·Anthropic·Feb 2026

45. Introducing Claude Opus 4.6

Agent teams, 1M-token context, and GDPval-AA dominance

Product Announcement
Summary

Anthropic released Claude Opus 4.6, its most capable model, featuring 'agent teams' (multiple agents splitting larger tasks into segmented jobs), 1M-token context window in beta, and leading performance on Terminal-Bench 2.0 (agentic coding), Humanity's Last Exam (multidisciplinary reasoning), and GDPval-AA (economically valuable knowledge work — outperforming GPT-5.2 by 144 Elo). Pricing held at $5/$25 per million tokens.

Key Concepts

Agent teams: multiple Claude instances collaborating on segmented tasks

The headline feature of Opus 4.6 is 'agent teams' — orchestrated groups of Claude agents that split larger tasks into parallel subtasks, each handled by a specialized agent instance. This moves beyond single-agent workflows to multi-agent coordination, enabling complex projects like codebase-wide refactors, multi-document analysis, and end-to-end research workflows.

1M-token context window expands long-horizon agentic work

Opus 4.6 doubles the context window from 200K to 1M tokens (in beta), matching Gemini's context length. For agentic workflows, context length is a critical bottleneck — longer context means agents can maintain state across more complex, multi-step tasks without losing track of earlier information.

GDPval-AA: 144 Elo points above GPT-5.2 on economically valuable work

On GDPval-AA — an evaluation of performance on real-world knowledge work tasks in finance, legal, and other domains — Opus 4.6 outperformed GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. This is a meaningful gap on tasks that directly correspond to professional economic value.

Terminal-Bench 2.0 and Humanity's Last Exam leadership

Opus 4.6 achieved the highest score on Terminal-Bench 2.0 (an agentic coding evaluation) and led all frontier models on Humanity's Last Exam (a complex multidisciplinary reasoning test designed to be unsolvable by AI). These benchmarks validate capabilities across both applied engineering and abstract reasoning.

Connections

45. Introducing Clau…Feb 202628. Claude 4 Family …May 202536. Claude Sonnet 4 …Oct 202546. Introducing Clau…Feb 2026Influenced byInfluences
Influenced by
28. Claude 4 Family Launch (Opus 4 & Sonnet 4)
May 2025
36. Claude Sonnet 4 — 1M Token Context
Oct 2025
Influences
46. Introducing Claude Sonnet 4.6
Feb 2026