SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Models·OpenAI·May 2024

26. Hello GPT-4o

The omnimodal model

Product Announcement
Summary

Launched GPT-4o ('omni'), a natively multimodal model that processes text, audio, and vision in a single end-to-end architecture, enabling real-time voice conversation with emotional expressiveness and sub-200ms latency.

Key Concepts

Single neural network processes text, audio, image, and video — no pipeline needed

A single neural network that accepts any combination of text, audio, image, and video as input, and generates text, audio, and image outputs. Audio generation includes tone, emotion, and singing.

~232ms audio response time — comparable to human conversational latency

~232ms average response time for audio — comparable to human conversational latency.

GPT-4o made available free, massively expanding access to frontier AI

GPT-4o was made available to free ChatGPT users, massively expanding access.

50% cheaper than GPT-4 Turbo with 2x faster throughput on the API

50% cheaper than GPT-4 Turbo on the API, with 2x faster throughput.

Connections

26. Hello GPT-4oMay 202415. Robust Speech Re…Sep 202221. GPT-4V(ision) Sy…Sep 202330. 12 Days of OpenA…Dec 202431. Introducing Oper…Jan 2025Influenced byInfluences
Influenced by
15. Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
Sep 2022
21. GPT-4V(ision) System Card
Sep 2023
Influences
30. 12 Days of OpenAI: o3, Sora, and More
Dec 2024
31. Introducing Operator
Jan 2025