Google Introduces Eighth-Gen TPUs at Cloud Next 2026

Google Unveils Eighth-Generation TPUs: TPU 8t and TPU 8i

Google has launched its eighth-generation Tensor Processing Units (TPUs), unveiling two specialized chips—TPU 8t for large-scale model training and TPU 8i for high-speed inference. These chips aim to power the emerging era of autonomous AI agents. Announced at the Google Cloud Next 2026 conference, they promise up to 2.8x performance gains over previous generations while enhancing energy efficiency. General availability is expected later this year, with early access requests open now (Google Blog).

TPU 8t: Pre-Training Powerhouse

The TPU 8t is optimized for massive-scale model training, capable of scaling to 9,600 chips in a single superpod using a 3D torus network topology. It supports native FP4 (4-bit floating point), doubling matrix multiply unit (MXU) throughput by reducing memory bandwidth bottlenecks. This allows larger models to fit in local buffers, minimizing energy use for data movement (Google Cloud).

TPU 8i: Inference and Agent Execution

The TPU 8i targets post-training inference and serving with 3x more on-chip SRAM (up to 384 MB combined with 288 GB high-bandwidth memory). It features a Collectives Acceleration Engine (CAE) and a new Boardfly network topology for low-latency AI agent execution. This design reduces core idle time during long-context decoding, ideal for multi-step workflows in agentic systems (Google Cloud).

Integration and Architecture

Both TPUs integrate Arm-based Axion CPU headers to eliminate host bottlenecks from data preparation latency, forming part of Google Cloud's AI Hypercomputer. This architecture blends hardware, software, and networking, emphasizing custom 3D-stacked designs (Google Blog).

Historical Context and Evolution

Google's TPU lineage dates back to 2016, with iterative advancements powering models like Gemini. The seventh-generation TPU (Ironwood), released in November 2025, delivered breakthroughs in scale but treated training and inference uniformly. Earlier versions, such as TPU v6 (2024), scaled to 8,960 chips per pod (Futunn).

Competitor Comparison

Google's dual-chip strategy challenges NVIDIA's Blackwell platform, which unifies training/inference on GB200 GPUs but faces memory wall issues in agentic workloads. TPU 8i's 3x SRAM boost and Boardfly topology claim superior latency for high-concurrency serving compared to Blackwell's NVLink (Google Cloud).

Feature	TPU 8t	TPU 8i	NVIDIA Blackwell (GB200)	AMD MI300X
Primary Use	Training	Inference/Agents	Unified	Unified
Key Innovation	FP4, 9,600-chip pod	3x SRAM, Boardfly	NVLink 5.0	192 GB HBM3
Perf vs. Prev Gen	2.8x (vs. Ironwood)	Low-latency serving	4x inference (claimed)	1.3x vs. H100

Strategic Timing and Implications

The split reflects AI's training-inference divergence: agentic systems demand iterative reasoning alongside ever-larger pre-training runs. Google's move counters NVIDIA's monopoly, timed with Cloud Next 2026 to capture enterprise migrations (Google Blog).

These TPUs could democratize agentic AI, enabling responsive tools for enterprises via Google Cloud. However, skeptics question ecosystem maturity: Google's JAX framework trails PyTorch/CUDA adoption, potentially limiting uptake (Google Cloud).

[[Internal Link: ChatGPT]]