Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores

Key Takeaways

▸Hyper-DERP achieves parity with Tailscale's production DERP relay using 50% fewer CPU cores through systems-level optimizations
▸Kernel-space TLS encryption via kTLS and io_uring eliminate the performance penalties of userspace packet processing and epoll-based I/O multiplexing
▸The share-nothing, shard-per-core architecture removes contention and context switching from the critical forwarding path, demonstrating the importance of low-level systems engineering for infrastructure workloads

Source:

Hacker Newshttps://hyper-derp.dev/blog/hyper-derp-announcement/↗

Summary

A systems engineer has developed Hyper-DERP, a high-performance relay implementation that matches Tailscale's production-grade DERP (Detoured Encrypted Routing Protocol) throughput while using approximately half the CPU cores. The optimization was achieved by replacing the Go-based userspace approach with a C implementation that leverages kernel-space encryption via OpenSSL and io_uring for efficient I/O multiplexing. This eliminates millions of expensive kernel context switches per second that plague the epoll-based approach, reducing overhead through batched syscalls and a share-nothing, shard-per-core architecture inspired by the Seastar framework. The achievement was validated through rigorous benchmarking on GCP c4-highcpu VMs with 4,903 test runs and 95% confidence intervals, demonstrating significant performance gains in a critical infrastructure component used by Tailscale when peer-to-peer WireGuard connections cannot be established.

Rigorous benchmarking methodology with 4,903 runs validates the performance claims against a production-hardened target

Editorial Opinion

This work exemplifies how careful systems engineering can dramatically improve performance in critical infrastructure. While the Tailscale DERP relay is a solid, production-proven implementation, this demonstrates that substantial optimization opportunities remain when you're willing to drop down to C, kernel APIs, and share-nothing concurrency patterns. The achievement has implications beyond VPN relays—it highlights how frameworks like io_uring and approaches like kTLS are enabling a new generation of ultra-efficient network infrastructure software.

Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores

Key Takeaways

▸Hyper-DERP achieves parity with Tailscale's production DERP relay using 50% fewer CPU cores through systems-level optimizations
▸Kernel-space TLS encryption via kTLS and io_uring eliminate the performance penalties of userspace packet processing and epoll-based I/O multiplexing
▸The share-nothing, shard-per-core architecture removes contention and context switching from the critical forwarding path, demonstrating the importance of low-level systems engineering for infrastructure workloads

Summary

Rigorous benchmarking methodology with 4,903 runs validates the performance claims against a production-hardened target

Editorial Opinion

This work exemplifies how careful systems engineering can dramatically improve performance in critical infrastructure. While the Tailscale DERP relay is a solid, production-proven implementation, this demonstrates that substantial optimization opportunities remain when you're willing to drop down to C, kernel APIs, and share-nothing concurrency patterns. The achievement has implications beyond VPN relays—it highlights how frameworks like io_uring and approaches like kTLS are enabling a new generation of ultra-efficient network infrastructure software.

Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores

Key Takeaways

Summary

Editorial Opinion

More from Tailscale

Tailscale Launches tailscale-rs: Official Rust Library for Embedding Tailscale in Applications

Hyper-DERP: Engineer Achieves Tailscale DERP Relay Throughput with Half the CPU Cores

Tailscale Simplifies Pricing with Generous Free Plan and More Predictable Business Tiers

Comments

Suggested

Anthropic Updates Model Context Protocol to Simplify Enterprise AI Deployment

Anthropic's Fable 5 AI Disproves Historic Jacobian Conjecture

GitHub Code Quality Launches Generally Available with AI-Assisted Detection and Autofix

Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores

Key Takeaways

Summary

Editorial Opinion

More from Tailscale

Tailscale Launches tailscale-rs: Official Rust Library for Embedding Tailscale in Applications

Hyper-DERP: Engineer Achieves Tailscale DERP Relay Throughput with Half the CPU Cores

Tailscale Simplifies Pricing with Generous Free Plan and More Predictable Business Tiers

Comments

Suggested

Anthropic Updates Model Context Protocol to Simplify Enterprise AI Deployment

Anthropic's Fable 5 AI Disproves Historic Jacobian Conjecture

GitHub Code Quality Launches Generally Available with AI-Assisted Detection and Autofix