Models·OpenAI·May 2020

★8. Language Models are Few-Shot Learners (GPT-3)

The model that made the world pay attention

Research Paper

Summary

Introduced GPT-3, a 175B parameter model that demonstrated remarkable few-shot learning — performing tasks from just a few examples in the prompt without any gradient updates — fundamentally changing what people expected language models could do.

Key Concepts

Large enough models learn new tasks from just a few prompt examples — no fine-tuning needed

GPT-3's defining contribution wasn't just better benchmarks — it was the discovery that sufficiently large language models can learn new tasks from just a handful of examples provided in the prompt. No fine-tuning, no gradient updates. Just show it a few input-output pairs and it figures out the pattern.

Zero-shot, one-shot, and few-shot: three prompting paradigms

• Zero-shot: describe the task, no examples • One-shot: one example • Few-shot: 2-64 examples in the prompt

At 175B parameters, abilities appeared that smaller models simply didn't have

At 175B parameters, GPT-3 displayed capabilities that smaller models simply didn't have — arithmetic, code generation, creative writing, and even rudimentary reasoning. These "emergent" abilities (capabilities that appear at scale) became a defining concept in AI.

Shift from open-source to paid API — the "Open" in OpenAI begins to erode

OpenAI released GPT-3 not as open-source but through a paid API — marking the shift from OpenAI's original open-source ethos to a commercial model.