BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
OPEN SOURCEMicrosoft2026-04-28

VibeVoice: Microsoft's Open-Source Voice AI Suite Reaches Hugging Face Transformers

Key Takeaways

  • ▸VibeVoice-ASR is now available through Hugging Face Transformers, enabling seamless integration for developers building speech-to-text applications
  • ▸Both ASR and TTS models support long-form processing (60+ minutes for ASR, 90 minutes for TTS) with multilingual support across 50+ languages
  • ▸Innovative continuous speech tokenizers at 7.5 Hz frame rate combined with LLM and diffusion frameworks enable high-fidelity audio with computational efficiency
Source:
Hacker Newshttps://github.com/microsoft/VibeVoice↗

Summary

Microsoft has released VibeVoice, a comprehensive open-source framework for voice AI that includes both automatic speech recognition (ASR) and text-to-speech (TTS) models. The latest milestone came on March 6, 2026, when VibeVoice-ASR was integrated into the Hugging Face Transformers library, enabling seamless integration into developer projects and democratizing access to advanced speech processing capabilities for the broader AI community.

The VibeVoice suite represents a significant advance in long-form audio processing. VibeVoice-ASR can handle 60-minute audio files in a single pass while supporting over 50 languages, with features like speaker diarization, timestamping, and customized hotword recognition. Meanwhile, VibeVoice-Realtime-0.5B provides real-time text-to-speech generation with support for multiple languages and speaking styles. Both models leverage continuous speech tokenizers operating at 7.5 Hz, combined with LLM and diffusion-based architectures for superior audio quality and computational efficiency.

Since open-sourcing the framework beginning in August 2025, Microsoft has progressively enhanced the VibeVoice ecosystem with fine-tuning code, vLLM inference support for faster processing, expanded multilingual capabilities, and technical reports. The commitment to open-source development, coupled with responsible AI principles demonstrated by Microsoft's proactive approach to misuse prevention, positions VibeVoice as a foundational tool for voice AI research and deployment across industries.

  • Complete open-source suite includes fine-tuning code, vLLM optimization, and published technical reports; models are available on Hugging Face and in interactive playgrounds
Generative AISpeech & AudioDeep LearningOpen Source

More from Microsoft

MicrosoftMicrosoft
INDUSTRY REPORT

Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

2026-06-12
MicrosoftMicrosoft
INDUSTRY REPORT

Microsoft Warns Big Tech That Gen Z's AI Backlash Signals Need for Accountability

2026-06-11
MicrosoftMicrosoft
RESEARCH

Research Reveals 'Fugue Lock'—LLMs Enter Erratic States When Over-Constrained

2026-06-10

Comments

Suggested

OpenAIOpenAI
POLICY & REGULATION

Canadian Mother Sues OpenAI Over ChatGPT's Role in Daughter's Death

2026-06-12
SunoSuno
FUNDING & BUSINESS

Musicians Sue Over Unpaid AI Settlement Royalties from Suno, Udio Deals

2026-06-12
WebAssembly Community GroupWebAssembly Community Group
RESEARCH

WebAssembly Community Proposes wasi:webgpu for GPU Computing on the Edge and Server

2026-06-12
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us