SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Products·OpenAI·Aug 2021

12. Evaluating Large Language Models Trained on Code (Codex)

Teaching GPT to write code

Research Paper
Summary

Introduced Codex, a GPT model fine-tuned on code from GitHub that could generate functional programs from natural language descriptions, powering GitHub Copilot and launching the AI-assisted coding revolution.

Key Concepts

GPT-3 fine-tuned on 159GB of GitHub code, solving 28.8% of programming challenges

A GPT-3-family model fine-tuned on 159GB of Python code from GitHub. It could solve 28.8% of HumanEval problems (a benchmark of programming challenges) on first attempt, rising to 72.31% with 100 samples.

164 hand-crafted coding problems that became the standard code generation benchmark

The paper introduced HumanEval — 164 hand-crafted programming problems that test code generation from docstrings — which became the standard benchmark for code generation.

Codex powered Copilot — AI-assisted coding for millions of developers

Codex became the backbone of GitHub Copilot, launched in partnership with Microsoft, bringing AI-assisted coding to millions of developers.

Connections

12. Evaluating Large…Aug 20218. Language Models …May 202016. ChatGPT: Optimiz…Nov 2022Influenced byInfluences
Influenced by
8. Language Models are Few-Shot Learners (GPT-3)
May 2020
Influences
16. ChatGPT: Optimizing Language Models for Dialogue
Nov 2022