Models·OpenAI·Mar 2026

★38. GPT-5.4

Native computer use meets frontier reasoning

Product Announcement

Summary

Released GPT-5.4 with native computer-use capabilities, 1M-token context, and state-of-the-art agentic performance. First general-purpose model to surpass human performance on OSWorld-Verified (75% vs 72.4%) and match professionals in 83% of GDPval occupational comparisons.

Key Concepts

First general-purpose model with native computer-use capabilities for agentic workflows

In Codex and the API, GPT-5.4 is the first general-purpose model released with native, state-of-the-art computer-use capabilities. Agents can operate computers, navigate applications, fill forms, and carry out complex workflows across software environments — a capability previously limited to specialized models.

Surpasses human performance on OSWorld-Verified (75% vs 72.4%) and matches professionals in 83% of GDPval tasks

On OSWorld-Verified, which tests real-world computer interaction, GPT-5.4 achieves 75.0% success rate — surpassing human performance benchmarked at 72.4%. On GDPval, which evaluates professional knowledge work across 44 occupations, it matches or exceeds industry professionals in 83.0% of comparisons.

1M-token context with improved token efficiency and an upfront thinking plan

Supports up to 1 million tokens of context, allowing agents to plan, execute, and verify tasks across long horizons. Adds an upfront thinking plan for midcourse adjustments and solves problems with significantly fewer tokens than GPT-5.2.

Most factual model yet: 33% fewer false claims and 18% fewer erroneous responses vs GPT-5.2

OpenAI's most factual model to date, with individual claims 33% less likely to be false and full responses 18% less likely to contain errors compared to GPT-5.2. This represents a meaningful step toward reducing hallucinations in production use.

Connections

Influenced by

31. Introducing Operator

Jan 2025

37. GPT-5.2 / Codex

Dec 2025