OpenAI Introduces GPT-5.3-Codex-Spark with Cerebras

OpenAI Launches GPT-5.3-Codex-Spark: Ultra-Fast Coding AI

OpenAI has unveiled GPT-5.3-Codex-Spark, a lightweight, real-time coding model that generates over 1,000 tokens per second on Cerebras' Wafer Scale Engine 3 (WSE-3) chips. This marks OpenAI's first major shift away from Nvidia hardware for inference (OpenAI, ServeTheHome).

Now available in research preview for ChatGPT Pro users via the Codex app, CLI, VS Code extension, and limited API access, the model promises 15x faster generation speeds with a 128k context window, optimized for interactive developer workflows (Firstpost).

Key Features and Performance Breakthroughs

Described as a "smaller version" of GPT-5.3-Codex, Spark prioritizes speed over raw intelligence, enabling near-instantaneous code generation for tasks like real-time edits, logic adjustments, and interface tweaks (Simon Willison).

Performance: Completed a "build a Snake game" prompt in 9 seconds, compared to 43 seconds for the standard GPT-5.3-Codex medium variant.
Accuracy: 77.3% on Terminal-Bench 2.0, outperforming GPT-5.2-Codex's 64%.
Integration: Text-only design ensures seamless use alongside frontier models for long-running tasks (TechZine).

Historical Context: OpenAI's Codex Evolution

OpenAI's Codex lineage traces back to 2021's GPT-3-based Codex, which powered early GitHub Copilot and focused on code generation from natural language (The Zvi).

Iterations: GPT-5.2-Codex improved token efficiency and agentic capabilities.
Current Model: GPT-5.3-Codex, launched alongside Spark, claims state-of-the-art SWE-Bench Pro scores.

Strategic Partnership with Cerebras

Announced January 14, 2026, OpenAI's collaboration with Cerebras signals a deliberate diversification from Nvidia dominance. Cerebras' WSE-3 enables 10-20x speed gains over Nvidia H100 clusters (ServeTheHome).

Why now? Surging demand for real-time AI agents coincides with maturing non-Nvidia hardware ecosystems. This move hedges against Nvidia supply constraints and positions OpenAI for scalable, low-latency serving tiers amid a 2026 AI hardware arms race.

Competitor Landscape

Model/Provider	Key Strengths	Speed (Tokens/sec)	Benchmarks	Hardware
GPT-5.3-Codex-Spark (OpenAI)	Real-time coding, 128k context	>1,000	77.3% Terminal-Bench 2.0	Cerebras WSE-3
GPT-5.3-Codex (OpenAI)	General agentic coding	~200-300 (est.)	64% Terminal-Bench 2.0	Nvidia/Cerebras hybrid
Claude Opus 4.6 (Anthropic)	Reasoning + coding	Moderate	Competitive SWE-Bench Pro	Custom (Nvidia-based)
Llama 3.1 70B on Cerebras (Meta/Open)	Open-source speed	Up to 2,000	N/A (inference-focused)	Cerebras WSE-3

Implications and Skeptical Notes

For developers, Spark transforms AI from a "slow consultant" into a live collaborator, accelerating prototyping and debugging (Firstpost).

Critiques: Not a full GPT-5.3 replacement—lacking multimodal capabilities.
Scalability: While Cerebras demos impress, scalability for non-coding tasks is unproven (Simon Willison).

This launch underscores OpenAI's push toward hybrid intelligence: massive models for depth, Spark-like speedsters for breadth.