Products·OpenAI·Aug 2021

12. Evaluating Large Language Models Trained on Code (Codex)

Teaching GPT to write code

Research Paper

Summary

Introduced Codex, a GPT model fine-tuned on code from GitHub that could generate functional programs from natural language descriptions, powering GitHub Copilot and launching the AI-assisted coding revolution.

Key Concepts

GPT-3 fine-tuned on 159GB of GitHub code, solving 28.8% of programming challenges

A GPT-3-family model fine-tuned on 159GB of Python code from GitHub. It could solve 28.8% of HumanEval problems (a benchmark of programming challenges) on first attempt, rising to 72.31% with 100 samples.

164 hand-crafted coding problems that became the standard code generation benchmark

The paper introduced HumanEval — 164 hand-crafted programming problems that test code generation from docstrings — which became the standard benchmark for code generation.

Codex powered Copilot — AI-assisted coding for millions of developers

Codex became the backbone of GitHub Copilot, launched in partnership with Microsoft, bringing AI-assisted coding to millions of developers.

Connections

Influenced by

8. Language Models are Few-Shot Learners (GPT-3)

May 2020

Influences

16. ChatGPT: Optimizing Language Models for Dialogue

Nov 2022