Updated risk assessment framework with continuous monitoring and expanded threat categories
PolicyOpenAI updated its Preparedness Framework from the 2023 Beta to v2, incorporating lessons from GPT-5 evaluations and the o1 system card process. Major changes: expanded threat taxonomy (adding autonomous replication, cyber offense, and persuasion categories), shifted from pre-deployment-only to continuous monitoring, introduced quantitative risk scoring replacing qualitative Low/Medium/High/Critical tiers, and formalized the Preparedness Team's authority to delay or halt deployments. The update reflected both internal pressure from the board crisis aftermath and external regulatory expectations.
The v2 framework expanded from four to seven threat categories, adding autonomous replication and resource acquisition (ARRA), sophisticated cyber offense capabilities, and large-scale persuasion/manipulation. Each category has specific evaluation protocols and quantitative thresholds. The ARRA category was particularly significant — it directly addressed concerns about models that could self-replicate or acquire resources without human oversight.
v1 only evaluated models before deployment. v2 introduced continuous post-deployment monitoring with automated evaluations running on production models. If a deployed model's risk score crosses thresholds due to capability elicitation advances or new attack vectors, the framework triggers re-evaluation and potential deployment restrictions. This closed a critical gap where deployed models could become more dangerous as external techniques improved.
The qualitative Low/Medium/High/Critical tiers were replaced with numerical risk scores (0-100) across each threat category. Specific score thresholds trigger mandatory actions: scores above 60 require executive review, above 75 require board notification, and above 90 halt deployment. This quantification reduced subjectivity in risk assessment decisions.