SS
About Me
Frontier AI Paper BriefingsPokebowlClinical Trial EnrollerLittle Human Names
DisclaimersPrivacy PolicyTerms of Use
Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Explorer
Products·OpenAI·Feb 2024

25. Sora: Creating video from text

Text-to-video enters the frontier

Product Announcement
Summary

Announced Sora, a text-to-video diffusion model capable of generating up to 60 seconds of high-fidelity video with complex scenes, multiple characters, and camera motion — a step-change in generative video quality.

Key Concepts

Diffusion transformer on spacetime patches — video as sequences of visual tokens

A diffusion transformer model operating on spacetime patches of video and images. It processes video as sequences of spacetime patches (analogous to tokens in language models).

Up to 60s of coherent video with complex camera motion and persistent characters

Up to 60 seconds of video, complex camera motion, multiple characters with persistent identity, physically plausible (though not perfect) interactions.

Framed as a "world simulator" that learns physics and causality from video data

OpenAI framed Sora not just as a video generator but as a "world simulator" — a model that understands physics, causality, and 3D consistency by learning from video data.

Connections

25. Sora: Creating v…Feb 202414. DALL-E 2: Hierar…Apr 202218. GPT-4 Technical …Mar 202330. 12 Days of OpenA…Dec 2024Influenced byInfluences
Influenced by
14. DALL-E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents
Apr 2022
18. GPT-4 Technical Report
Mar 2023
Influences
30. 12 Days of OpenAI: o3, Sora, and More
Dec 2024