BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
OPEN SOURCEMicrosoft2026-04-28

VibeVoice: Microsoft's Open-Source Voice AI Suite Reaches Hugging Face Transformers

Key Takeaways

  • ▸VibeVoice-ASR is now available through Hugging Face Transformers, enabling seamless integration for developers building speech-to-text applications
  • ▸Both ASR and TTS models support long-form processing (60+ minutes for ASR, 90 minutes for TTS) with multilingual support across 50+ languages
  • ▸Innovative continuous speech tokenizers at 7.5 Hz frame rate combined with LLM and diffusion frameworks enable high-fidelity audio with computational efficiency
Source:
Hacker Newshttps://github.com/microsoft/VibeVoice↗

Summary

Microsoft has released VibeVoice, a comprehensive open-source framework for voice AI that includes both automatic speech recognition (ASR) and text-to-speech (TTS) models. The latest milestone came on March 6, 2026, when VibeVoice-ASR was integrated into the Hugging Face Transformers library, enabling seamless integration into developer projects and democratizing access to advanced speech processing capabilities for the broader AI community.

The VibeVoice suite represents a significant advance in long-form audio processing. VibeVoice-ASR can handle 60-minute audio files in a single pass while supporting over 50 languages, with features like speaker diarization, timestamping, and customized hotword recognition. Meanwhile, VibeVoice-Realtime-0.5B provides real-time text-to-speech generation with support for multiple languages and speaking styles. Both models leverage continuous speech tokenizers operating at 7.5 Hz, combined with LLM and diffusion-based architectures for superior audio quality and computational efficiency.

Since open-sourcing the framework beginning in August 2025, Microsoft has progressively enhanced the VibeVoice ecosystem with fine-tuning code, vLLM inference support for faster processing, expanded multilingual capabilities, and technical reports. The commitment to open-source development, coupled with responsible AI principles demonstrated by Microsoft's proactive approach to misuse prevention, positions VibeVoice as a foundational tool for voice AI research and deployment across industries.

  • Complete open-source suite includes fine-tuning code, vLLM optimization, and published technical reports; models are available on Hugging Face and in interactive playgrounds
Generative AISpeech & AudioDeep LearningOpen Source

More from Microsoft

MicrosoftMicrosoft
UPDATE

GitHub Copilot Code Review Will Start Consuming GitHub Actions Minutes Starting June 1, 2026

2026-04-28
MicrosoftMicrosoft
RESEARCH

Microsoft Research Finds Frontier LLMs Corrupt Documents During Long Delegated Workflows

2026-04-27
MicrosoftMicrosoft
FUNDING & BUSINESS

Microsoft to Invest $18B in Australia to Expand AI and Cloud Infrastructure

2026-04-27

Comments

Suggested

MemTensorMemTensor
RESEARCH

MemTensor Introduces HeLa-Mem: Bio-Inspired Memory Architecture Brings Hebbian Learning to LLM Agents

2026-04-28
AnthropicAnthropic
RESEARCH

DELEGATE-52 Benchmark Exposes Critical Reliability Flaws in Frontier LLMs During Document Delegation

2026-04-28
Ombre (Community / 38caveman)Ombre (Community / 38caveman)
OPEN SOURCE

Ombre: Open Source AI Infrastructure Platform Launches with Security-First Agents

2026-04-28
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us