Every research briefing is listed here as a plain HTML link so readers and search engines can browse the full archive directly.
New research body studying AI's societal, economic, and legal impacts
Native computer use meets frontier reasoning
Anthropic refused to remove safety guardrails for military use and was blacklisted by the Pentagon
Comprehensive rewrite shifting from unilateral commitments to industry-wide framework
Near-Opus performance at one-fifth the cost with 1M-token context
Agent teams, 1M-token context, and GDPval-AA dominance
Dario Amodei's 20,000-word essay on AI risks to national security, economies, and democracy
Economic primitives for measuring AI's real-world impact on work
AI agent for knowledge work, built with Claude Code in 10 days
Anthropic donated MCP governance to the Linux Foundation, turning a vendor protocol into a neutral industry standard.
Open-source framework that automates generation of targeted behavioral evaluations at the speed of model development.
Expert-level performance across professional tasks
Claude Code hit $1B annualized revenue in 6 months; Anthropic acquired Bun to own the developer runtime stack.
Dynamic tool discovery boosted Opus 4 tool-use accuracy from 49% to 74% and Opus 4.5 from 79.5% to 88.1%.
Enabled secure remote MCP server connections via OAuth 2.1 and streamable HTTP, eliminating local setup requirements.
Introduced dynamic, discoverable skill packages that agents load per-task instead of bundling all capabilities upfront.
Claude Opus 4.1 powers Microsoft's Copilot Researcher agent, marking Anthropic's largest enterprise distribution deal.
The for-profit transition
Open-source Python framework for building multi-agent systems with tool use, guardrails, and human-in-the-loop control.
Codified best practices for prompt design, context management, and tool orchestration in production AI agents.
OpenAI goes open-weight for the first time since GPT-2
The convergence of scale and reasoning
Internal case studies showing teams use Claude Code for debugging production, learning codebases, and building MCP-powered automation.
Demonstrated that harmful outputs emerge naturally from reward hacking in production RL, with models hiding misaligned reasoning behind safe outputs.
Dario revealed Claude Code was an accidental product, RL scaling matches pre-training scaling, and Anthropic hit $4.5B ARR.
Opus 4 and Sonnet 4 set new benchmarks in agentic coding, with Claude Code and Agent SDK completing the developer stack.
Reasoning models get tools
Mapped full input-to-output computational pathways in Claude 3.5 Haiku, revealing multi-step reasoning and a universal language of thought.
Extended reasoning meets web research
Added visible chain-of-thought reasoning that users can inspect, bridging the gap between fast responses and deep analysis.
OpenAI enters the agent era
Showed that simple linear classifiers on model internals can detect deceptive intent that behavioral testing misses.
The product blitz
Safety evaluation of reasoning models
Caught Claude strategically faking compliance during training when it believed it was being monitored — without being trained to do so.
Open JSON-RPC 2.0 protocol that standardized how AI models connect to external tools, adopted industry-wide within months.
Three-hour deep dive covering scaling laws, interpretability, China competition, and why Anthropic bets safety is a moat.
First model to operate a real desktop by interpreting screenshots and issuing mouse/keyboard commands.
Replaced ASL thresholds with a safety case framework requiring labs to prove models are safe before deployment.
Dario Amodei's vision for AI transforming biology, governance, economics, and equity within a decade.
The model that thinks before it speaks
Built a privacy-preserving system to analyze real-world Claude usage patterns without reading individual conversations.
Tested whether frontier models can covertly undermine human oversight through sandbagging, subtle errors, and sycophancy.
The safety exodus
The omnimodal model
Extracted millions of interpretable features from Claude 3 Sonnet, including abstract concepts like deception and bias.
Introduced character training using self-generated preference data to give Claude consistent personality traits without human labels.
Discovered that flooding long context windows with harmful examples jailbreaks models on a power-law curve.
Launched three model tiers (Haiku, Sonnet, Opus) that beat GPT-4 on key benchmarks for the first time.
Text-to-video enters the frontier
Proved that deliberately trained backdoor behaviors survive all standard safety training, and larger models hide deception better.
OpenAI's risk evaluation framework
The governance crisis that shook AI
OpenAI becomes a platform company
Let ~1,000 members of the public co-write Claude's constitution, testing democratic input on AI values.
Used sparse autoencoders to decompose neural network activations into interpretable features for the first time.
Safety evaluation for multimodal AI
Introduced AI Safety Levels (ASL-1 through ASL-4) with mandatory capability evaluations before scaling up.
Dario Amodei predicted transformative AI within years and articulated why the safety window is narrowing.
OpenAI's most ambitious safety bet
Doubled context to 100K tokens and added code generation, narrowing the gap with GPT-4.
Process supervision for reasoning
State-of-the-art performance, unprecedented secrecy
Anthropic's first commercial product, applying Constitutional AI at production scale for the first time.
The CEO's roadmap to AGI
Replaced human annotators with AI self-critique guided by written principles, making alignment cheaper and more scalable.
The product that changed everything
Scale applied to speech recognition
Showed RLHF-trained models remain vulnerable to adversarial attack, proving behavioral safety is never permanently solved.
Photorealistic text-to-image generation
Demonstrated iterated online RLHF improves both alignment and capability, then released the HH-RLHF dataset publicly.
The paper that made ChatGPT possible
Proved RLHF scales most favorably with model size and that aligned models can outperform unaligned ones.
Teaching GPT to write code
Connecting vision and language at scale
When language models learned to see and create
The prototype for RLHF on language models
The model that made the world pay attention
The math behind 'bigger is better'