SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Safety & Alignment·OpenAI·Sep 2025

41. Preparedness Framework v2 (Updated)

Updated risk assessment framework with continuous monitoring and expanded threat categories

Policy
Summary

OpenAI updated its Preparedness Framework from the 2023 Beta to v2, incorporating lessons from GPT-5 evaluations and the o1 system card process. Major changes: expanded threat taxonomy (adding autonomous replication, cyber offense, and persuasion categories), shifted from pre-deployment-only to continuous monitoring, introduced quantitative risk scoring replacing qualitative Low/Medium/High/Critical tiers, and formalized the Preparedness Team's authority to delay or halt deployments. The update reflected both internal pressure from the board crisis aftermath and external regulatory expectations.

Key Concepts

Expanded threat taxonomy: autonomous replication, cyber offense, persuasion, CBRN

The v2 framework expanded from four to seven threat categories, adding autonomous replication and resource acquisition (ARRA), sophisticated cyber offense capabilities, and large-scale persuasion/manipulation. Each category has specific evaluation protocols and quantitative thresholds. The ARRA category was particularly significant — it directly addressed concerns about models that could self-replicate or acquire resources without human oversight.

Continuous monitoring replaces pre-deployment-only assessment

v1 only evaluated models before deployment. v2 introduced continuous post-deployment monitoring with automated evaluations running on production models. If a deployed model's risk score crosses thresholds due to capability elicitation advances or new attack vectors, the framework triggers re-evaluation and potential deployment restrictions. This closed a critical gap where deployed models could become more dangerous as external techniques improved.

Quantitative risk scoring with deployment gates

The qualitative Low/Medium/High/Critical tiers were replaced with numerical risk scores (0-100) across each threat category. Specific score thresholds trigger mandatory actions: scores above 60 require executive review, above 75 require board notification, and above 90 halt deployment. This quantification reduced subjectivity in risk assessment decisions.

Connections

41. Preparedness Fra…Sep 202529. OpenAI o1 System…Dec 2024Influenced by
Influenced by
29. OpenAI o1 System Card
Dec 2024