BotBeat
...
← Back

> ▌

TailscaleTailscale
RESEARCHTailscale2026-04-21

Hyper-DERP: Engineering Team Achieves Tailscale Derper Throughput Using Half the CPU Cores

Key Takeaways

  • ▸Hyper-DERP achieves parity with Tailscale's production DERP relay using 50% fewer CPU cores through systems-level optimizations
  • ▸Kernel-space TLS encryption via kTLS and io_uring eliminate the performance penalties of userspace packet processing and epoll-based I/O multiplexing
  • ▸The share-nothing, shard-per-core architecture removes contention and context switching from the critical forwarding path, demonstrating the importance of low-level systems engineering for infrastructure workloads
Source:
Hacker Newshttps://hyper-derp.dev/blog/hyper-derp-announcement/↗

Summary

A systems engineer has developed Hyper-DERP, a high-performance relay implementation that matches Tailscale's production-grade DERP (Detoured Encrypted Routing Protocol) throughput while using approximately half the CPU cores. The optimization was achieved by replacing the Go-based userspace approach with a C implementation that leverages kernel-space encryption via OpenSSL and io_uring for efficient I/O multiplexing. This eliminates millions of expensive kernel context switches per second that plague the epoll-based approach, reducing overhead through batched syscalls and a share-nothing, shard-per-core architecture inspired by the Seastar framework. The achievement was validated through rigorous benchmarking on GCP c4-highcpu VMs with 4,903 test runs and 95% confidence intervals, demonstrating significant performance gains in a critical infrastructure component used by Tailscale when peer-to-peer WireGuard connections cannot be established.

  • Rigorous benchmarking methodology with 4,903 runs validates the performance claims against a production-hardened target

Editorial Opinion

This work exemplifies how careful systems engineering can dramatically improve performance in critical infrastructure. While the Tailscale DERP relay is a solid, production-proven implementation, this demonstrates that substantial optimization opportunities remain when you're willing to drop down to C, kernel APIs, and share-nothing concurrency patterns. The achievement has implications beyond VPN relays—it highlights how frameworks like io_uring and approaches like kTLS are enabling a new generation of ultra-efficient network infrastructure software.

Deep LearningMLOps & InfrastructureAI Hardware

More from Tailscale

TailscaleTailscale
PRODUCT LAUNCH

Tailscale Launches tailscale-rs: Official Rust Library for Embedding Tailscale in Applications

2026-04-16
TailscaleTailscale
RESEARCH

Hyper-DERP: Engineer Achieves Tailscale DERP Relay Throughput with Half the CPU Cores

2026-04-14
TailscaleTailscale
UPDATE

Tailscale Simplifies Pricing with Generous Free Plan and More Predictable Business Tiers

2026-04-08

Comments

Suggested

MITMIT
PRODUCT LAUNCH

Mitshe Launches Open-Source AI Agent Platform with Isolated Docker Workspaces for Autonomous Development

2026-04-21
AnthropicAnthropic
RESEARCH

CodeRabbit Builds Planning Layer on Claude to Improve Code Review Accuracy

2026-04-21
Multiple (Research Institutions)Multiple (Research Institutions)
RESEARCH

Sequential Monte Carlo Speculative Decoding Achieves 2.36x Speedup in LLM Inference

2026-04-21
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us