SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Models·OpenAI·Jan 2020

★7. Scaling Laws for Neural Language Models

The math behind 'bigger is better'

Research Paper
Summary

Established precise mathematical relationships (power laws) predicting how language model performance improves with model size, dataset size, and compute, providing the theoretical foundation for the race to scale.

Key Concepts

Loss follows smooth power laws across model size, data, and compute

Cross-entropy loss follows smooth power laws as a function of model parameters (N), dataset size (D), and compute budget (C).

Scale matters more than shape; larger models are more sample-efficient
Scaling laws turned AI development from art into engineering

If you have $X to spend on compute, the scaling laws tell you exactly how big to make your model and how much data to use. This turned AI development from an art into (partially) an engineering discipline.

Connections

7. Scaling Laws for…Jan 20206. Language Models …Feb 20198. Language Models …May 2020Influenced byInfluences
Influenced by
6. Language Models are Unsupervised Multitask Learners (GPT-2)
Feb 2019
Influences
8. Language Models are Few-Shot Learners (GPT-3)
May 2020