Native computer use meets frontier reasoning
Product AnnouncementReleased GPT-5.4 with native computer-use capabilities, 1M-token context, and state-of-the-art agentic performance. First general-purpose model to surpass human performance on OSWorld-Verified (75% vs 72.4%) and match professionals in 83% of GDPval occupational comparisons.
In Codex and the API, GPT-5.4 is the first general-purpose model released with native, state-of-the-art computer-use capabilities. Agents can operate computers, navigate applications, fill forms, and carry out complex workflows across software environments — a capability previously limited to specialized models.
On OSWorld-Verified, which tests real-world computer interaction, GPT-5.4 achieves 75.0% success rate — surpassing human performance benchmarked at 72.4%. On GDPval, which evaluates professional knowledge work across 44 occupations, it matches or exceeds industry professionals in 83.0% of comparisons.
Supports up to 1 million tokens of context, allowing agents to plan, execute, and verify tasks across long horizons. Adds an upfront thinking plan for midcourse adjustments and solves problems with significantly fewer tokens than GPT-5.2.
OpenAI's most factual model to date, with individual claims 33% less likely to be false and full responses 18% less likely to contain errors compared to GPT-5.2. This represents a meaningful step toward reducing hallucinations in production use.