The staged release that changed AI safety discourse
Research PaperDemonstrated that a 1.5B parameter language model trained on web text could perform diverse tasks without task-specific training, and sparked the first major AI safety debate by initially withholding the full model over misuse concerns.
GPT-2 could perform tasks it was never explicitly trained for — summarization, translation, question answering — by simply conditioning on task-describing prompts. This was the first clear demonstration of in-context learning.
The 1.5B parameter model was 10x larger than GPT-1 and showed qualitative capability improvements, not just quantitative benchmark gains. It could generate coherent multi-paragraph text that was often indistinguishable from human writing.
OpenAI initially released only the 124M parameter version, withholding 345M, 762M, and 1.5B versions. The stated concern was potential misuse for generating disinformation.