Models·OpenAI·Jun 2018

★4. Improving Language Understanding (GPT-1)

The paper that started the GPT paradigm

Research Paper

Summary

Introduced the Generative Pre-trained Transformer (GPT) — demonstrating that unsupervised pre-training on a large text corpus followed by supervised fine-tuning produces strong performance across diverse NLP tasks.

Key Concepts

12-layer transformer decoder (117M params) trained on BookCorpus

A 12-layer transformer decoder with 117 million parameters, trained on BookCorpus (~7,000 unpublished books).

Pre-train on next-token prediction, then fine-tune on downstream tasks

The key insight: next-token prediction as a pre-training objective captures deep linguistic structure that transfers across tasks.

SOTA on 9 of 12 NLP benchmarks with a single unified architecture

GPT-1 achieved state-of-the-art on 9 of 12 NLP benchmarks, improving over the best previous approaches by significant margins — and it did this with a single architecture that required only task-specific output layers.

Connections

Influenced by

3. OpenAI Charter

Apr 2018

Influences

6. Language Models are Unsupervised Multitask Learners (GPT-2)

Feb 2019