Building the First 8-Node NVIDIA GB10 Cluster: Scaling Beyond Official Specs

Key Takeaways

▸Eight-node NVIDIA GB10 cluster successfully deployed with 1TB memory, 160 ARM cores, and 400GbE networking—exceeding official support limits
▸MikroTik's affordable CRS804 DDQ 400GbE switch enabled RDMA clustering and NCCL scaling, demonstrating cost-effective alternatives to enterprise networking
▸Large language models like Kimi K2.5 and K2.6 can run entirely locally on the cluster without cloud dependency

Source:

Hacker Newshttps://www.servethehome.com/big-cluster-little-power-the-8x-nvidia-gb10-cluster-marvell-cisco-ubiquiti-qnap-arm/↗

Summary

A technical deep-dive into constructing an 8-node NVIDIA GB10 cluster with 1TB of memory and 160 ARM cores, well beyond NVIDIA's officially supported configurations at the time of launch. The article details how the author overcame hardware limitations using the MikroTik CRS804 DDQ switch and 400GbE networking to enable RDMA clustering. The team successfully deployed large language models including Kimi K2.5 and K2.6 entirely locally, with each GB10 unit featuring NVIDIA's Grace Blackwell SoC, 128GB LPDDR5X memory, and ConnectX-7 200GbE networking.

While NVIDIA only officially supported two GB10 units in early 2026 (updated to four by GTC 2026), this project demonstrates that the engineering community is already pushing the boundaries by stacking eight nodes. Key to this scalability is the fast 200Gbps networking infrastructure enabling RDMA and NCCL clustering, making multi-node AI inference and training more accessible than previous generations. The article also shares insights about the ARM CPU performance advantages of the platform and practical networking choices that simplified deployment compared to enterprise data center switches, suggesting a path for democratizing large-scale AI infrastructure.

NVIDIA's GB10 platform delivers strong CPU performance alongside GPU capabilities, making it viable for both AI inference and edge data processing

Editorial Opinion

This DIY cluster build exemplifies how quickly the maker and enthusiast community is surpassing vendor support matrices. While NVIDIA's official backing stopped at four nodes, this engineering shows that with intelligent infrastructure choices—particularly the MikroTik switch—scaling to eight nodes is already practical and affordable. This pattern suggests that next-generation consumer and prosumer AI hardware is evolving faster than typical enterprise support cycles, potentially accelerating adoption of on-premises AI workloads and reducing cloud dependency for power users.

NVIDIA

RESEARCH NVIDIA2026-04-28

Building the First 8-Node NVIDIA GB10 Cluster: Scaling Beyond Official Specs

Key Takeaways

▸Eight-node NVIDIA GB10 cluster successfully deployed with 1TB memory, 160 ARM cores, and 400GbE networking—exceeding official support limits
▸MikroTik's affordable CRS804 DDQ 400GbE switch enabled RDMA clustering and NCCL scaling, demonstrating cost-effective alternatives to enterprise networking
▸Large language models like Kimi K2.5 and K2.6 can run entirely locally on the cluster without cloud dependency

Source:

Hacker Newshttps://www.servethehome.com/big-cluster-little-power-the-8x-nvidia-gb10-cluster-marvell-cisco-ubiquiti-qnap-arm/↗

Summary

NVIDIA's GB10 platform delivers strong CPU performance alongside GPU capabilities, making it viable for both AI inference and edge data processing

Editorial Opinion

This DIY cluster build exemplifies how quickly the maker and enthusiast community is surpassing vendor support matrices. While NVIDIA's official backing stopped at four nodes, this engineering shows that with intelligent infrastructure choices—particularly the MikroTik switch—scaling to eight nodes is already practical and affordable. This pattern suggests that next-generation consumer and prosumer AI hardware is evolving faster than typical enterprise support cycles, potentially accelerating adoption of on-premises AI workloads and reducing cloud dependency for power users.

Building the First 8-Node NVIDIA GB10 Cluster: Scaling Beyond Official Specs

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

Synthetic Pretraining Emerges as Fundamental Shift in AI Model Development

NVIDIA Presents Inaugural Vera Rubin New Frontiers Prize to Princeton Physicist for Breakthrough Particle Theory Discovery

Guess-Verify-Refine: Data-Aware Algorithm Achieves 1.88x Speedup for Sparse-Attention Decoding on Blackwell

Comments

Suggested

Google Agrees to 'Any Lawful' Pentagon AI Deal, Waives Veto Power Over Military Use

Anthropic Releases Claude Connectors for Creative Tools, Partnering with Adobe, Autodesk, Blender, and Others

TSMC Reveals Advanced CoWoS Roadmap: 48x More Compute and 34x Greater Bandwidth by 2029

Building the First 8-Node NVIDIA GB10 Cluster: Scaling Beyond Official Specs

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

Synthetic Pretraining Emerges as Fundamental Shift in AI Model Development

NVIDIA Presents Inaugural Vera Rubin New Frontiers Prize to Princeton Physicist for Breakthrough Particle Theory Discovery

Guess-Verify-Refine: Data-Aware Algorithm Achieves 1.88x Speedup for Sparse-Attention Decoding on Blackwell

Comments

Suggested

Google Agrees to 'Any Lawful' Pentagon AI Deal, Waives Veto Power Over Military Use

Anthropic Releases Claude Connectors for Creative Tools, Partnering with Adobe, Autodesk, Blender, and Others

TSMC Reveals Advanced CoWoS Roadmap: 48x More Compute and 34x Greater Bandwidth by 2029