Alignment·OpenAI·Dec 2025

46. OpenAI Model Spec

Public specification of how OpenAI shapes model behavior, values, and refusal boundaries

Policy

Summary

OpenAI published its Model Spec — a comprehensive public document defining how models should behave, what values they should express, and where refusal boundaries lie. Covers helpfulness vs. safety tradeoffs, instruction hierarchy, content policy implementation, and persona guidelines. Continuously updated based on user feedback. Parallels Anthropic's Claude Model Spec but with different philosophical emphases — OpenAI's spec is more permissive on content and emphasizes user autonomy.

Key Concepts

Comprehensive specification of behavior, values, and refusal boundaries

The Model Spec is a living document that explicitly states how OpenAI models should behave across different contexts. It covers what values models should express (helpfulness, honesty, harmlessness), the hierarchy of conflicting instructions, and where refusal boundaries are drawn. Unlike implicit alignment through RLHF, this specification makes choices explicit.

Permissiveness philosophy: emphasizes user autonomy over paternalism

OpenAI's spec is notably more permissive than Anthropic's Claude Spec on content policies. It emphasizes user autonomy and avoids paternalistic refusals, reflecting OpenAI's philosophy that users should have control over model behavior. This contrasts with Anthropic's more protective approach to certain content categories.

Continuous iteration based on user feedback and real-world use

Unlike static documents, the Model Spec is continuously updated based on user feedback, emerging risks, and real-world usage patterns. This creates a feedback loop where user experience shapes official policy, making the spec a living document rather than a fixed specification.

Connections

Influenced by

24. Preparedness Framework (Beta)

Dec 2023