OpenAI's most ambitious safety bet
Blog PostAnnounced a new team led by Ilya Sutskever and Jan Leike dedicated to solving the alignment problem for superintelligent AI within 4 years, allocating 20% of OpenAI's compute to the effort.
How do you align a system that is smarter than you? You can't use human feedback to evaluate outputs you can't understand. Current alignment techniques require human ability to judge model behavior — this breaks down for superintelligent systems.
Use AI systems to help supervise other AI systems ("scalable oversight"). Specifically, train a "roughly human-level" AI to evaluate more capable AI systems.
20% of OpenAI's compute budget dedicated to superalignment, with a 4-year deadline to solve the problem.
Led by Sutskever (Chief Scientist) and Leike (VP of Alignment).