OpenAI and Partners Introduce Multipath Reliable Connection Protocol for Scalable AI Cluster Networking
Key Takeaways
- ▸MRC shifts design philosophy from maximizing individual port bandwidth to distributing aggregate bandwidth across more network connections, improving both performance and reliability
- ▸The protocol implements redundant path routing and intelligent failover mechanisms that allow networks to heal from link failures without halting compute jobs
- ▸As a superset extension of existing RoCE, MRC offers a more pragmatic evolution path than alternative protocols like Ultra Ethernet, lowering adoption barriers
Summary
OpenAI, Microsoft, Broadcom, AMD, and Nvidia have jointly developed Multipath Reliable Connection (MRC), a new network protocol designed to optimize Ethernet infrastructure for large-scale AI training clusters. Rather than pursuing ever-increasing port bandwidth, MRC leverages existing Ethernet switch ASICs to create flatter, more interconnected networks with higher radix (more links between devices), reducing latency, cost, and power consumption. Built as a pragmatic extension of the existing RDMA over Converged Ethernet (RoCE) standard, MRC enables self-healing networks that can gracefully recover from link failures without interrupting training jobs through intelligent traffic rerouting across redundant paths. The protocol was unveiled alongside a published research paper and Open Compute Project specification, positioning it for rapid adoption across hyperscalers and AI infrastructure providers.
- The multi-vendor collaboration signals industry consensus that AI infrastructure networking has become a critical bottleneck requiring coordinated solutions
Editorial Opinion
MRC exemplifies the kind of under-the-radar infrastructure innovation that directly multiplies AI training efficiency without requiring new hardware purchases. While hyperscalers won't headline this like a new model release, the cumulative benefits—lower latency, fault resilience, and dramatically reduced TCO—may prove more valuable to the viability of frontier AI than faster GPUs. This is elegant engineering solving real problems.



