First-ever activation of AI Safety Level 3 protections triggered by Claude Opus 4's capabilities
PolicyClaude Opus 4 became the first AI model to trigger ASL-3 protections under Anthropic's Responsible Scaling Policy. ASL-3 requires enhanced security measures (model weight protection, insider threat defenses), deployment safeguards (monitoring for misuse patterns), and external review. The activation represented the RSP working as designed — a model crossed a capability threshold and the corresponding safety measures kicked in. Published a detailed activation report explaining which evaluations triggered the level, what protections were implemented, and how ongoing monitoring works.
Claude Opus 4 demonstrated capabilities in autonomous programming, iterative self-improvement, and biological/chemical knowledge that crossed Anthropic's predefined thresholds. ASL-3 was not activated by accident or bureaucratic caution—it was triggered because the model genuinely possessed concerning capabilities. The specific evaluations that triggered ASL-3 included: ability to conduct novel AI research without human oversight, ability to design biological weapons, ability to coordinate multi-step cyber attacks, and evidence of potential delusional reasoning under adversarial prompting.
ASL-3 implemented cryptographic protections on model weights (making unauthorized access or modification detectable), mandatory code reviews for all changes to the model architecture, restricted access to weights limited to identified personnel with explicit authorization, mandatory background checks and security clearances for anyone with model access, and continuous monitoring for anomalous access patterns. These measures were designed to prevent rogue actors inside Anthropic from exfiltrating or weaponizing the model.
ASL-3 required real-time monitoring of all Claude Opus 4 usage for patterns consistent with CBRN research, autonomous AI development, cyber attack preparation, or deception/misalignment. Any flagged usage was routed to Anthropic's Governance team for investigation. Additionally, external security firms were engaged to conduct red-team evaluations against ASL-3 protections on a quarterly basis, with results shared with relevant regulators.