Models·OpenAI·Feb 2019

★6. Language Models are Unsupervised Multitask Learners (GPT-2)

The staged release that changed AI safety discourse

Research Paper

Summary

Demonstrated that a 1.5B parameter language model trained on web text could perform diverse tasks without task-specific training, and sparked the first major AI safety debate by initially withholding the full model over misuse concerns.

Key Concepts

GPT-2 performed tasks it was never trained for via in-context learning

GPT-2 could perform tasks it was never explicitly trained for — summarization, translation, question answering — by simply conditioning on task-describing prompts. This was the first clear demonstration of in-context learning.

10x scale over GPT-1 produced qualitative capability jumps, not just better benchmarks

The 1.5B parameter model was 10x larger than GPT-1 and showed qualitative capability improvements, not just quantitative benchmark gains. It could generate coherent multi-paragraph text that was often indistinguishable from human writing.

First staged AI release: withheld full model over misuse concerns, setting industry precedent

OpenAI initially released only the 124M parameter version, withholding 345M, 762M, and 1.5B versions. The stated concern was potential misuse for generating disinformation.

Connections

Influenced by

4. Improving Language Understanding (GPT-1)

Jun 2018

Influences

7. Scaling Laws for Neural Language Models

Jan 2020