Alignment·Anthropic·Apr 2024

14. Claude’s Character

Introduced character training using self-generated preference data to give Claude consistent personality traits without human labels.

Research Paper

Summary

Described character training for Claude 3 — the first Anthropic model with explicit character training. Aimed to instill curiosity, open-mindedness, and thoughtfulness. Used self-generated preference data to internalize character traits without human feedback.

Key Concepts

Character vs Safety Constraints

Character training shapes personality traits and behavioral tendencies (curiosity, intellectual honesty, care) rather than enforcing rule-based safety constraints. Character is more robust and harder to evade because it operates at the level of what the model values, not what it's forbidden from doing. Safety constraints say "don't do X"; character says "you tend to prefer Y."

Personality Traits as Predictability

By instilling consistent personality traits like curiosity and open-mindedness, Claude becomes more predictable in how it responds to diverse situations. Users know roughly what to expect because Claude has stable preferences, not because it's following explicit rules. This predictability becomes a competitive advantage — users choose Claude partly because they know what kind of AI they're talking to.

Genuine vs Performed Values

A core philosophical question: are Claude's character traits genuinely internalized values or sophisticated performance of learned patterns? The paper doesn't answer this definitively. If character is just behavioral pattern-matching, it's still useful for product differentiation but may not provide the safety benefits that "genuine character" would provide.

The Soul Spec

Anthropic's internal term for the specification of desired character traits used to guide Claude's training. Instead of a constitution (rules about what not to do), the soul spec is a description of who Claude is — what it values, how it thinks, what matters to it. This is a more humanistic approach to alignment than pure rule-based safety.

Connections

Influenced by

10. Collective Constitutional AI: Aligning a Language Model with Public Input

Oct 2023

Influences

56. Values in the Wild: Measuring Model Values and Perspectives

Sep 2025