Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores
Key Takeaways
- ▸Hyper-DERP achieves parity with Tailscale's production DERP relay using 50% fewer CPU cores through systems-level optimizations
- ▸Kernel-space TLS encryption via kTLS and io_uring eliminate the performance penalties of userspace packet processing and epoll-based I/O multiplexing
- ▸The share-nothing, shard-per-core architecture removes contention and context switching from the critical forwarding path, demonstrating the importance of low-level systems engineering for infrastructure workloads
Summary
A systems engineer has developed Hyper-DERP, a high-performance relay implementation that matches Tailscale's production-grade DERP (Detoured Encrypted Routing Protocol) throughput while using approximately half the CPU cores. The optimization was achieved by replacing the Go-based userspace approach with a C implementation that leverages kernel-space encryption via OpenSSL and io_uring for efficient I/O multiplexing. This eliminates millions of expensive kernel context switches per second that plague the epoll-based approach, reducing overhead through batched syscalls and a share-nothing, shard-per-core architecture inspired by the Seastar framework. The achievement was validated through rigorous benchmarking on GCP c4-highcpu VMs with 4,903 test runs and 95% confidence intervals, demonstrating significant performance gains in a critical infrastructure component used by Tailscale when peer-to-peer WireGuard connections cannot be established.
- Rigorous benchmarking methodology with 4,903 runs validates the performance claims against a production-hardened target
Editorial Opinion
This work exemplifies how careful systems engineering can dramatically improve performance in critical infrastructure. While the Tailscale DERP relay is a solid, production-proven implementation, this demonstrates that substantial optimization opportunities remain when you're willing to drop down to C, kernel APIs, and share-nothing concurrency patterns. The achievement has implications beyond VPN relays—it highlights how frameworks like io_uring and approaches like kTLS are enabling a new generation of ultra-efficient network infrastructure software.



