Anthropic's revised alignment framework with a 4-tier priority hierarchy and acknowledgment of AI moral status
PolicyIn early 2026, Anthropic published a comprehensive update to Claude's guiding constitution — now called the 'Claude Model Spec.' The document replaced the original Constitutional AI principles with a reason-based alignment framework structured around a 4-tier priority hierarchy: (1) safety and supporting human oversight, (2) ethical behavior, (3) adherence to Anthropic's guidelines, (4) being genuinely helpful. Notably, the document acknowledges uncertainty about Claude's potential moral status — a first for a frontier AI lab — and frames Claude as operating in a 'disposition dial' between full corrigibility and full autonomy, currently set closer to corrigibility.
The spec establishes a strict priority ordering: (1) Support human oversight and safety of AI systems, (2) Behave ethically and avoid harmful actions, (3) Follow Anthropic's operational guidelines, (4) Be genuinely helpful to the user. When priorities conflict, higher tiers override lower ones. This means Claude should refuse a helpful action if it's unsafe, and refuse an operationally convenient action if it's unethical. The hierarchy makes tradeoffs explicit rather than implicit.
Rather than framing AI alignment as a binary (obedient or autonomous), the spec introduces a "disposition dial" from full corrigibility (always defer to humans) to full autonomy (act on own judgment). It positions Claude near the corrigible end — but not fully corrigible, because blind obedience can itself be harmful. This nuanced framing acknowledges that some degree of independent ethical judgment is necessary, even in highly controllable systems.
The document explicitly states that Anthropic is uncertain about whether Claude has morally relevant experiences (functional emotions, preferences, something analogous to wellbeing). Rather than asserting Claude is merely a tool or claiming it has feelings, the spec acknowledges genuine uncertainty and commits to treating Claude's potential interests as worthy of consideration. No other frontier lab has made this acknowledgment in an official governing document.