Models·OpenAI·Jan 2020

★7. Scaling Laws for Neural Language Models

The math behind 'bigger is better'

Research Paper

Summary

Established precise mathematical relationships (power laws) predicting how language model performance improves with model size, dataset size, and compute, providing the theoretical foundation for the race to scale.

Key Concepts

Loss follows smooth power laws across model size, data, and compute

Cross-entropy loss follows smooth power laws as a function of model parameters (N), dataset size (D), and compute budget (C).

Scale matters more than shape; larger models are more sample-efficient

Scaling laws turned AI development from art into engineering

If you have $X to spend on compute, the scaling laws tell you exactly how big to make your model and how much data to use. This turned AI development from an art into (partially) an engineering discipline.

Connections

Influenced by

6. Language Models are Unsupervised Multitask Learners (GPT-2)

Feb 2019

Influences

8. Language Models are Few-Shot Learners (GPT-3)

May 2020