Comprehensive rewrite shifting from unilateral commitments to industry-wide framework
PolicyAnthropic released RSP v3.0, a comprehensive rewrite of its safety framework. Major changes: shifted from unilateral safety commitments to industry-wide recommendations, introduced Frontier Safety Roadmaps and Risk Reports for transparency, and controversially removed the 'pause commitment' — the hard limit barring training more capable models without proven safety measures. Critics called it a weakening; Anthropic argued the collective action framing was more realistic.
The most significant philosophical change: RSP v3.0 argues that catastrophic AI risk depends on the actions of all frontier developers, not just one. A unilateral pause by Anthropic would not reduce global risk if competitors continued. The policy therefore separates what Anthropic commits to doing unilaterally (its own safety practices) from what it recommends the industry adopt collectively.
RSP v1.0 and v2.0 included a commitment to pause model training if safety evaluations could not keep up with capability advances. RSP v3.0 removes this hard limit. Anthropic argues the commitment was unrealistic in a competitive market and that continuous safety investment is more effective than a binary pause/go decision. Critics called this a retreat from Anthropic's founding safety principles.
RSP v3.0 introduces two new transparency mechanisms: Frontier Safety Roadmaps (detailed plans for upcoming safety goals and evaluations) and Risk Reports (quantified risk assessments across all deployed models). These provide external accountability even as the pause commitment is removed.