Google Details Eight Years of TPU Evolution: From v2 to Ironwood Supercomputers
Key Takeaways
- ▸Google's TPU architecture maintained surprising stability across five generations despite accommodating rapid shifts in deep learning workloads, particularly the transition to Transformers
- ▸Performance improvements were dramatic: 10x increase in HBM capacity/bandwidth, 100x increase in peak node performance, and 3600x increase in supercomputer performance over eight years
- ▸Major focus on sustainability: significant improvements in performance-per-Watt and carbon emissions per floating-point operation, addressing the growing environmental concerns around AI infrastructure
Summary
Google has published a major research paper detailing the evolution of its Tensor Processing Units (TPUs) across five generations, from TPU v2 to the latest Ironwood platform. The paper, set to appear in IEEE Micro magazine's July/August 2026 issue, chronicles how Google's training supercomputers have scaled to meet the demands of modern deep learning workloads, particularly the rise of Transformer models. Over eight years, Google achieved a 100x increase in peak node performance and a staggering 3600x improvement in overall supercomputer performance, while maintaining a remarkably stable core architecture.
Beyond raw performance gains, the paper emphasizes Google's progress in power efficiency and sustainability. HBM capacity and bandwidth per node increased 10x, while the company made substantial improvements in performance per Watt and carbon emissions per floating-point operation. The work highlights key infrastructure innovations including optical circuit switches, built-in self-test mechanisms, and hardware replay capabilities that enhance system resilience at scale.
The research identifies six key characteristics that may define successful training accelerators throughout this decade, offering insights into the hardware engineering challenges facing AI infrastructure providers as model sizes and training demands continue to accelerate.
- Infrastructure resilience enhanced through optical circuit switches, built-in self-test, and hardware replay mechanisms for operating large-scale training clusters reliably
- Paper identifies six features likely to characterize successful training accelerators in the coming years
Editorial Opinion
This paper signals Google's maturation as an AI infrastructure provider. The fact that the TPU architecture remained stable while accommodating a 100x performance increase suggests thoughtful long-term design—neither chasing every fleeting trend nor locked into obsolescence. Most significantly, the explicit focus on efficiency and sustainability metrics reflects a recognition that raw performance gains alone aren't sufficient in an era of climate concerns and energy costs. If these six identified characteristics become industry standard, Google has effectively shared its roadmap with competitors, but also validated its approach.

