Former OpenAI Superalignment co-lead joins Anthropic after public departure over safety concerns
Blog PostIn May 2024, Jan Leike resigned as co-head of OpenAI's Superalignment team, publicly stating that 'safety culture and processes have taken a back seat to shiny products.' Days later, Anthropic announced Leike would join to lead a new Alignment Science team. The hire was the most significant safety researcher transfer between frontier labs, bringing expertise in scalable oversight, reward modeling, and alignment evaluation. It simultaneously weakened OpenAI's safety apparatus and strengthened Anthropic's — crystallizing the narrative that Anthropic was the 'safety-first' lab.
Leike's departure statement was specific: the Superalignment team had been promised 20% of compute but struggled to get resources, safety processes were not keeping pace with capability advances, and "building shiny products" was taking priority. This was not a vague concern — it was a senior insider's account of institutional failure on safety. Coming from the person who co-led the team specifically created to solve alignment, it carried exceptional weight.
At Anthropic, Leike leads the Alignment Science team focused on scalable oversight — developing techniques for humans to supervise AI systems that may be smarter than them. This includes debate, recursive reward modeling, critique generation, and constitutional AI extensions. The team bridges Leike's OpenAI experience with Anthropic's existing interpretability and alignment infrastructure.
In the AI industry, where researchers have extraordinary leverage and options, departures are signals. Leike didn't leave for more money or a better title — he left because he believed Anthropic's institutional structure better supported safety research. Whether or not this assessment is correct, the signal it sent to the safety research community was powerful: Anthropic is where serious alignment researchers want to work.