SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Models·OpenAI·Jun 2018

★4. Improving Language Understanding (GPT-1)

The paper that started the GPT paradigm

Research Paper
Summary

Introduced the Generative Pre-trained Transformer (GPT) — demonstrating that unsupervised pre-training on a large text corpus followed by supervised fine-tuning produces strong performance across diverse NLP tasks.

Key Concepts

12-layer transformer decoder (117M params) trained on BookCorpus

A 12-layer transformer decoder with 117 million parameters, trained on BookCorpus (~7,000 unpublished books).

Pre-train on next-token prediction, then fine-tune on downstream tasks

The key insight: next-token prediction as a pre-training objective captures deep linguistic structure that transfers across tasks.

SOTA on 9 of 12 NLP benchmarks with a single unified architecture

GPT-1 achieved state-of-the-art on 9 of 12 NLP benchmarks, improving over the best previous approaches by significant margins — and it did this with a single architecture that required only task-specific output layers.

Connections

4. Improving Langua…Jun 20183. OpenAI CharterApr 20186. Language Models …Feb 2019Influenced byInfluences
Influenced by
3. OpenAI Charter
Apr 2018
Influences
6. Language Models are Unsupervised Multitask Learners (GPT-2)
Feb 2019