Building the First 8-Node NVIDIA GB10 Cluster: Scaling Beyond Official Specs
Key Takeaways
- ▸Eight-node NVIDIA GB10 cluster successfully deployed with 1TB memory, 160 ARM cores, and 400GbE networking—exceeding official support limits
- ▸MikroTik's affordable CRS804 DDQ 400GbE switch enabled RDMA clustering and NCCL scaling, demonstrating cost-effective alternatives to enterprise networking
- ▸Large language models like Kimi K2.5 and K2.6 can run entirely locally on the cluster without cloud dependency
Summary
A technical deep-dive into constructing an 8-node NVIDIA GB10 cluster with 1TB of memory and 160 ARM cores, well beyond NVIDIA's officially supported configurations at the time of launch. The article details how the author overcame hardware limitations using the MikroTik CRS804 DDQ switch and 400GbE networking to enable RDMA clustering. The team successfully deployed large language models including Kimi K2.5 and K2.6 entirely locally, with each GB10 unit featuring NVIDIA's Grace Blackwell SoC, 128GB LPDDR5X memory, and ConnectX-7 200GbE networking.
While NVIDIA only officially supported two GB10 units in early 2026 (updated to four by GTC 2026), this project demonstrates that the engineering community is already pushing the boundaries by stacking eight nodes. Key to this scalability is the fast 200Gbps networking infrastructure enabling RDMA and NCCL clustering, making multi-node AI inference and training more accessible than previous generations. The article also shares insights about the ARM CPU performance advantages of the platform and practical networking choices that simplified deployment compared to enterprise data center switches, suggesting a path for democratizing large-scale AI infrastructure.
- NVIDIA's GB10 platform delivers strong CPU performance alongside GPU capabilities, making it viable for both AI inference and edge data processing
Editorial Opinion
This DIY cluster build exemplifies how quickly the maker and enthusiast community is surpassing vendor support matrices. While NVIDIA's official backing stopped at four nodes, this engineering shows that with intelligent infrastructure choices—particularly the MikroTik switch—scaling to eight nodes is already practical and affordable. This pattern suggests that next-generation consumer and prosumer AI hardware is evolving faster than typical enterprise support cycles, potentially accelerating adoption of on-premises AI workloads and reducing cloud dependency for power users.



