LiveKit Releases eot-bench: Open Benchmark for Voice Agent End-of-Turn Detection
Key Takeaways
- ▸LiveKit released eot-bench, an open-source benchmark and dataset specifically designed for evaluating end-of-turn detection in voice AI systems
- ▸The benchmark includes real human-to-agent conversation data in 14 languages with annotated silence patterns, providing the first shared standard for measuring EoT detection performance
- ▸Models are evaluated under realistic conditions including latency budgets and interruption rates, measuring actual conversational quality rather than performance on isolated clips
Summary
LiveKit has released eot-bench, an open-source benchmark suite and dataset for end-of-turn (EoT) detection in voice AI systems. End-of-turn detection—determining when a user has finished speaking—is critical for creating natural-sounding voice agents, as poor detection causes agents to either interrupt users or leave awkward silences in conversation. Previously, this capability was difficult to measure and compare across different models due to the lack of a shared, public benchmark and standardized evaluation methodology.
The release includes livekit/eot-bench-data, the first open dataset of real human-to-agent conversations for EoT detection, spanning 14 languages: Arabic, Chinese, Dutch, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Spanish, and Turkish. Each conversation turn is annotated with silence pauses, allowing the benchmark to evaluate models under realistic conditions with true latency and interruption budgets, rather than on isolated audio clips. LiveKit's own Turn Detector v1 achieves the strongest results across the benchmark in English and across all languages.
The release addresses a fundamental gap in voice AI development by providing the field with reproducible evaluation standards. An interactive leaderboard allows developers to set specific latency and false-cutoff budgets to compare how different models perform under real-world constraints, balancing the dual challenges of avoiding interruptions while maintaining responsive conversation flow.
- LiveKit's Turn Detector v1 posts the strongest overall results, and an interactive leaderboard allows developers to compare performance under custom operating constraints
Editorial Opinion
End-of-turn detection has been one of the most intractable problems in voice AI, yet it remained largely evaluated in isolation on private datasets. By open-sourcing both eot-bench and the underlying multilingual dataset, LiveKit is providing the entire field with a much-needed common reference point for measuring progress. This kind of shared infrastructure—especially with realistic evaluation that mirrors actual deployment conditions—is essential for standardizing quality and accelerating innovation in voice agents.



