Lightweight model matching Claude 3 Opus performance at a fraction of the cost
Product AnnouncementAnthropic released Claude 3.5 Haiku on October 22, 2024 — a lightweight model that matched or exceeded Claude 3 Opus (the previous flagship) across most benchmarks while being significantly faster and cheaper. Priced at $1/MTok input and $5/MTok output, it demonstrated that capability improvements compound at every model tier, not just at the frontier. Claude 3.5 Haiku's efficiency made Claude accessible for high-volume applications — API integrations, real-time assistants, and batch processing — that were previously cost-prohibitive with larger models.
Claude 3.5 Haiku exceeded Claude 3 Opus on most coding, reasoning, and instruction-following benchmarks while being roughly 60x cheaper per token. This pattern — where the efficiency-tier model of generation N+1 matches the flagship of generation N — demonstrated that capability gains propagate through the entire model lineup, making frontier-class performance progressively more accessible.
With response times under 1 second for typical queries and token costs an order of magnitude lower than Opus, Haiku 3.5 enabled use cases that frontier models couldn't economically serve: real-time chat assistants, high-volume document processing, embedded AI features, and agentic workflows requiring many sequential LLM calls. The model's efficiency-to-capability ratio was specifically optimized for production deployment.
Despite its compact size, Claude 3.5 Haiku maintained a 200K token context window — the same as the larger Claude 3.5 Sonnet. This meant developers didn't have to sacrifice context capacity for speed and cost, making Haiku viable for long-document analysis, multi-file code review, and RAG applications that require large context.