Public specification of how OpenAI shapes model behavior, values, and refusal boundaries
PolicyOpenAI published its Model Spec — a comprehensive public document defining how models should behave, what values they should express, and where refusal boundaries lie. Covers helpfulness vs. safety tradeoffs, instruction hierarchy, content policy implementation, and persona guidelines. Continuously updated based on user feedback. Parallels Anthropic's Claude Model Spec but with different philosophical emphases — OpenAI's spec is more permissive on content and emphasizes user autonomy.
The Model Spec is a living document that explicitly states how OpenAI models should behave across different contexts. It covers what values models should express (helpfulness, honesty, harmlessness), the hierarchy of conflicting instructions, and where refusal boundaries are drawn. Unlike implicit alignment through RLHF, this specification makes choices explicit.
OpenAI's spec is notably more permissive than Anthropic's Claude Spec on content policies. It emphasizes user autonomy and avoids paternalistic refusals, reflecting OpenAI's philosophy that users should have control over model behavior. This contrasts with Anthropic's more protective approach to certain content categories.
Unlike static documents, the Model Spec is continuously updated based on user feedback, emerging risks, and real-world usage patterns. This creates a feedback loop where user experience shapes official policy, making the spec a living document rather than a fixed specification.