Safety·OpenAI·Jul 2023

20. Introducing Superalignment

OpenAI's most ambitious safety bet

Blog Post

Summary

Announced a new team led by Ilya Sutskever and Jan Leike dedicated to solving the alignment problem for superintelligent AI within 4 years, allocating 20% of OpenAI's compute to the effort.

Key Concepts

How do you align a system smarter than you when humans can't evaluate its outputs?

How do you align a system that is smarter than you? You can't use human feedback to evaluate outputs you can't understand. Current alignment techniques require human ability to judge model behavior — this breaks down for superintelligent systems.

Use AI to supervise AI — "scalable oversight" via human-level automated alignment researchers

Use AI systems to help supervise other AI systems ("scalable oversight"). Specifically, train a "roughly human-level" AI to evaluate more capable AI systems.

20% of OpenAI's compute and a 4-year deadline to solve superintelligence alignment

20% of OpenAI's compute budget dedicated to superalignment, with a 4-year deadline to solve the problem.

Led by Sutskever (Chief Scientist) and Leike (VP Alignment) — both would leave within a year

Led by Sutskever (Chief Scientist) and Leike (VP of Alignment).

Connections

Influenced by

17. Planning for AGI and beyond

Feb 2023

18. GPT-4 Technical Report

Mar 2023

Influences

23. OpenAI Board Crisis

Nov 2023