ServeTheHome Successfully Clusters 8 NVIDIA GB10 Units to Run Kimi K2.5 and K2.6 Models
Key Takeaways
- ▸ServeTheHome successfully built an 8-node NVIDIA GB10 cluster exceeding NVIDIA's supported configurations, proving aggressive scaling is viable
- ▸A single MikroTik CRS804 DDQ 400GbE switch enables multi-node scaling via NCCL and RDMA RoCE networking, eliminating the need for high-power enterprise switches
- ▸The cluster successfully runs massive models including Kimi K2.5 and K2.6 locally, demonstrating practical on-premises alternatives to cloud-based model inference
Summary
ServeTheHome has demonstrated a fully functional 8-node NVIDIA GB10 cluster featuring 1TB of combined memory, 160 Arm cores, and 400GbE RDMA networking—exceeding NVIDIA's supported configurations at the time of assembly. The feat required months of engineering and careful networking design, ultimately enabling the team to run Kimi K2.5 and K2.6, both massive language models, locally across the distributed cluster.
The project showcases the practical viability of scaling GB10 systems beyond NVIDIA's initially supported two-node topology. When ServeTheHome began in February 2026, only two-unit connections were supported; by late March at NVIDIA GTC 2026, support expanded to four nodes, yet the team had already achieved eight. The cluster uses a MikroTik CRS804 DDQ 400GbE switch to enable NCCL-based multi-node scaling with RoCE RDMA, complemented by a secondary 10GbE management network.
The engineering breakthrough demonstrates that large-scale AI model deployment on GB10 hardware is now significantly more accessible than just months ago. ServeTheHome's documentation of the cluster design, networking topology, and performance results provides a practical roadmap for researchers and organizations looking to deploy cutting-edge language models without relying on cloud infrastructure.
- NVIDIA's expanding support (2-node to 4-node during the project timeline) signals growing recognition of cluster demand, likely spurring further ecosystem tooling
Editorial Opinion
This demonstration is significant for AI infrastructure democratization. By proving that off-the-shelf networking hardware and thoughtful system design can enable stable eight-node GB10 clusters, ServeTheHome has lowered the technical and financial barriers to running state-of-the-art large language models locally. The rapid expansion of NVIDIA's official support during the project timeline suggests market demand is accelerating, likely positioning GB10 clusters as a viable middle ground between single-system deployments and expensive cloud alternatives.



