Alignment·Anthropic·Oct 2023

10. Collective Constitutional AI: Aligning a Language Model with Public Input

Let ~1,000 members of the public co-write Claude's constitution, testing democratic input on AI values.

Research Paper

Summary

Explored how public input could shape Claude's constitution instead of relying solely on Anthropic employees. Experimented with democratic processes for AI alignment — a step toward participatory AI governance.

Key Concepts

Democratic Input to AI Values

Rather than having AI companies unilaterally decide model behavior through internal constitutions, democratic input approaches involve collecting values and principles from diverse public stakeholders. This paper tested whether public deliberation could generate better alignment criteria than expert consensus, exploring how to represent diverse human values in AI systems.

Public Constitution Writing

The process of involving many individuals from the public in collaboratively drafting principles that govern AI behavior. This differs from traditional constitutional AI where a small team writes principles in isolation. Public constitution writing assumes broader participation improves legitimacy and reduces hidden biases.

Polis Platform

An open-source crowd-sourced consensus-building tool designed for deliberative democracy. Polis uses AI-assisted clustering of user opinions to find common ground on contentious issues. For this research, it enabled Anthropic to gather, analyze, and synthesize input from thousands of participants into a coherent set of AI values and principles.

Representation Problem

The fundamental challenge that participants in any deliberative process are rarely representative of broader populations. Online participation skews toward tech-savvy, educated, English-speaking demographics. This creates a gap between "what the public deliberated" and "what the actual public wants," making democratic legitimacy claims fragile.

Connections

Influenced by

4. Constitutional AI: Harmlessness from AI Feedback

Dec 2022

Influences

14. Claude’s Character

Apr 2024