SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Models·OpenAI·Sep 2024

★28. Learning to Reason with LLMs (o1)

The model that thinks before it speaks

Product Announcement
Summary

Launched o1 (codenamed 'Strawberry'), a model trained with reinforcement learning to reason through problems step-by-step before answering, achieving breakthrough performance on math, coding, and science benchmarks through 'test-time compute scaling'.

Key Concepts

Trained with RL to produce extended reasoning chains — thinking is the objective, not a prompt hack

Unlike GPT-4 (which can be prompted to think step-by-step), o1 is trained with RL to produce extended chains of thought. The model spends more time "thinking" before responding.

New scaling paradigm: more thinking time = better answers, instead of bigger models

A new scaling paradigm. Instead of making the model bigger (parameter scaling), you let the model think longer (inference compute scaling). More thinking time = better answers.

Chain-of-thought hidden from users to prevent strategy extraction and model distillation

The model's chain-of-thought is hidden from users (they see a summary). This prevents users from extracting the reasoning strategy and prevents model distillation.

83% on IMO qualifying (vs. GPT-4's 13%), 89th percentile Codeforces, PhD-level science

o1-preview scored 83% on the International Mathematics Olympiad qualifying exam (vs. GPT-4's 13%). 89th percentile on Codeforces. PhD-level performance on physics, biology, and chemistry.

Model reasons about safety policies in its chain-of-thought for nuanced refusals

o1 reasons about safety policies in its chain-of-thought, leading to more nuanced safety behavior than hardcoded refusals.

Connections

28. Learning to Reas…Sep 202419. Let's Verify Ste…May 202330. 12 Days of OpenA…Dec 202429. OpenAI o1 System…Dec 202431. Introducing Oper…Jan 2025Influenced byInfluences
Influenced by
19. Let's Verify Step by Step
May 2023
Influences
30. 12 Days of OpenAI: o3, Sora, and More
Dec 2024
29. OpenAI o1 System Card
Dec 2024
31. Introducing Operator
Jan 2025