Google Introduces Eighth-Gen TPUs at Cloud Next 2026
Google introduces eighth-generation TPUs, TPU 8t and TPU 8i, at Cloud Next 2026, targeting AI agents with significant performance gains.

Google Unveils Eighth-Generation TPUs: TPU 8t and TPU 8i
Google has launched its eighth-generation Tensor Processing Units (TPUs), unveiling two specialized chips—TPU 8t for large-scale model training and TPU 8i for high-speed inference. These chips aim to power the emerging era of autonomous AI agents. Announced at the Google Cloud Next 2026 conference, they promise up to 2.8x performance gains over previous generations while enhancing energy efficiency. General availability is expected later this year, with early access requests open now (Google Blog).
TPU 8t: Pre-Training Powerhouse
The TPU 8t is optimized for massive-scale model training, capable of scaling to 9,600 chips in a single superpod using a 3D torus network topology. It supports native FP4 (4-bit floating point), doubling matrix multiply unit (MXU) throughput by reducing memory bandwidth bottlenecks. This allows larger models to fit in local buffers, minimizing energy use for data movement (Google Cloud).
TPU 8i: Inference and Agent Execution
The TPU 8i targets post-training inference and serving with 3x more on-chip SRAM (up to 384 MB combined with 288 GB high-bandwidth memory). It features a Collectives Acceleration Engine (CAE) and a new Boardfly network topology for low-latency AI agent execution. This design reduces core idle time during long-context decoding, ideal for multi-step workflows in agentic systems (Google Cloud).
Integration and Architecture
Both TPUs integrate Arm-based Axion CPU headers to eliminate host bottlenecks from data preparation latency, forming part of Google Cloud's AI Hypercomputer. This architecture blends hardware, software, and networking, emphasizing custom 3D-stacked designs (Google Blog).
Historical Context and Evolution
Google's TPU lineage dates back to 2016, with iterative advancements powering models like Gemini. The seventh-generation TPU (Ironwood), released in November 2025, delivered breakthroughs in scale but treated training and inference uniformly. Earlier versions, such as TPU v6 (2024), scaled to 8,960 chips per pod (Futunn).
Competitor Comparison
Google's dual-chip strategy challenges NVIDIA's Blackwell platform, which unifies training/inference on GB200 GPUs but faces memory wall issues in agentic workloads. TPU 8i's 3x SRAM boost and Boardfly topology claim superior latency for high-concurrency serving compared to Blackwell's NVLink (Google Cloud).
| Feature | TPU 8t | TPU 8i | NVIDIA Blackwell (GB200) | AMD MI300X |
|---|---|---|---|---|
| Primary Use | Training | Inference/Agents | Unified | Unified |
| Key Innovation | FP4, 9,600-chip pod | 3x SRAM, Boardfly | NVLink 5.0 | 192 GB HBM3 |
| Perf vs. Prev Gen | 2.8x (vs. Ironwood) | Low-latency serving | 4x inference (claimed) | 1.3x vs. H100 |
Strategic Timing and Implications
The split reflects AI's training-inference divergence: agentic systems demand iterative reasoning alongside ever-larger pre-training runs. Google's move counters NVIDIA's monopoly, timed with Cloud Next 2026 to capture enterprise migrations (Google Blog).
These TPUs could democratize agentic AI, enabling responsive tools for enterprises via Google Cloud. However, skeptics question ecosystem maturity: Google's JAX framework trails PyTorch/CUDA adoption, potentially limiting uptake (Google Cloud).
[[Internal Link: ChatGPT]]



