AI Agents Autonomously Design Real 7nm GPU: From Verilog to GDSII
Key Takeaways
- ▸AI agents successfully completed an autonomous end-to-end chip design flow from Verilog to GDSII for a real 7nm GPU accelerator, demonstrating that autonomous silicon design is feasible without human hand-editing of hardware designs
- ▸The agents coordinate work, file root-cause analyses, review each other's pull requests, and solve real physical constraints (clock-tree insertion delay, hold violations, routing congestion, IR drop) inherent to 7nm silicon manufacturing
- ▸Humans now write markdown 'org code' to define agent workflows and invariants rather than directly editing hardware designs; 30,000 lines of chip design code were produced from 7,000 lines of markdown guidance
Summary
A new open-source project called AutoGPU demonstrates Claude-powered AI agents autonomously designing a real 7nm GPU from hardware description language (Verilog) to GDSII manufacturing layout. The agents design a 32×32 floating-point matrix multiply (matmul) accelerator organized as a systolic array, complete with distributed tensor memory, capable of processing fp8 inputs and producing fp32 results that match NumPy exactly.
The project represents a fundamental shift in chip design methodology. Rather than individual hardware engineers hand-editing Verilog and spending months on design reviews and timing closure, AI agents now collaborate via GitHub issues and pull requests to autonomously handle synthesis, place-and-route, and sign-off. The human role has shifted to writing "org code" in Markdown — defining workflows, root-cause analysis processes, invariants, and debugging strategies that guide the agents' behavior.
The full 1089-macro systolic array successfully hardened to a clean GDSII layout on a 7nm process in approximately 40 minutes, with zero design-rule violations (DRC) and timing closed. The project includes a complete self-authored sign-off toolchain with DRC, LVS (layout vs. schematic), IR-drop, antenna, and density checks, plus interactive 2D and 3D web viewers for inspecting the manufactured die down to individual metal wires.
- The resulting fp8 matmul accelerator passes full behavioral and cycle-accurate test suites end-to-end and closes timing on an actual 7nm process design kit, proving the design is realistic and manufacturable
Editorial Opinion
If autonomous silicon design becomes routine, it could fundamentally reshape the hardware innovation cycle. Rather than 12-month design iterations, teams could continuously run chip design agents to explore new architectures and iterate designs overnight. This would compress the timeline for hardware-software co-optimization and potentially disrupt the design-cadence advantages that currently favor incumbent chipmakers. However, the project remains early-stage with acknowledged workarounds and doesn't yet achieve full-chip timing closure — suggesting that while the technology is powerful and promising, it's not yet mature enough to replace traditional design flows at scale.


