OpenAI Introduces GPT-5.3-Codex-Spark with Cerebras Hardware
OpenAI unveils GPT-5.3-Codex-Spark, a real-time coding model using Cerebras hardware, offering 15x faster speeds.

OpenAI Launches GPT-5.3-Codex-Spark: Ultra-Fast Coding AI
OpenAI has unveiled GPT-5.3-Codex-Spark, a lightweight, real-time coding model that generates over 1,000 tokens per second on Cerebras' Wafer Scale Engine 3 (WSE-3) chips. This marks OpenAI's first major shift away from Nvidia hardware for inference (OpenAI, ServeTheHome).
Now available in research preview for ChatGPT Pro users via the Codex app, CLI, VS Code extension, and limited API access, the model promises 15x faster generation speeds with a 128k context window, optimized for interactive developer workflows (Firstpost).
Key Features and Performance Breakthroughs
Described as a "smaller version" of GPT-5.3-Codex, Spark prioritizes speed over raw intelligence, enabling near-instantaneous code generation for tasks like real-time edits, logic adjustments, and interface tweaks (Simon Willison).
- Performance: Completed a "build a Snake game" prompt in 9 seconds, compared to 43 seconds for the standard GPT-5.3-Codex medium variant.
- Accuracy: 77.3% on Terminal-Bench 2.0, outperforming GPT-5.2-Codex's 64%.
- Integration: Text-only design ensures seamless use alongside frontier models for long-running tasks (TechZine).
Historical Context: OpenAI's Codex Evolution
OpenAI's Codex lineage traces back to 2021's GPT-3-based Codex, which powered early GitHub Copilot and focused on code generation from natural language (The Zvi).
- Iterations: GPT-5.2-Codex improved token efficiency and agentic capabilities.
- Current Model: GPT-5.3-Codex, launched alongside Spark, claims state-of-the-art SWE-Bench Pro scores.
Strategic Partnership with Cerebras
Announced January 14, 2026, OpenAI's collaboration with Cerebras signals a deliberate diversification from Nvidia dominance. Cerebras' WSE-3 enables 10-20x speed gains over Nvidia H100 clusters (ServeTheHome).
Why now? Surging demand for real-time AI agents coincides with maturing non-Nvidia hardware ecosystems. This move hedges against Nvidia supply constraints and positions OpenAI for scalable, low-latency serving tiers amid a 2026 AI hardware arms race.
Competitor Landscape
| Model/Provider | Key Strengths | Speed (Tokens/sec) | Benchmarks | Hardware |
|---|---|---|---|---|
| GPT-5.3-Codex-Spark (OpenAI) | Real-time coding, 128k context | >1,000 | 77.3% Terminal-Bench 2.0 | Cerebras WSE-3 |
| GPT-5.3-Codex (OpenAI) | General agentic coding | ~200-300 (est.) | 64% Terminal-Bench 2.0 | Nvidia/Cerebras hybrid |
| Claude Opus 4.6 (Anthropic) | Reasoning + coding | Moderate | Competitive SWE-Bench Pro | Custom (Nvidia-based) |
| Llama 3.1 70B on Cerebras (Meta/Open) | Open-source speed | Up to 2,000 | N/A (inference-focused) | Cerebras WSE-3 |
Implications and Skeptical Notes
For developers, Spark transforms AI from a "slow consultant" into a live collaborator, accelerating prototyping and debugging (Firstpost).
- Critiques: Not a full GPT-5.3 replacement—lacking multimodal capabilities.
- Scalability: While Cerebras demos impress, scalability for non-coding tasks is unproven (Simon Willison).
This launch underscores OpenAI's push toward hybrid intelligence: massive models for depth, Spark-like speedsters for breadth.


