BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
PRODUCT LAUNCHMicrosoft2026-04-02

Microsoft AI Launches MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 Models in Foundry

Key Takeaways

  • ▸MAI-Transcribe-1 achieves state-of-the-art multilingual speech-to-text with 2.5x faster processing than comparable Azure services at competitive pricing
  • ▸MAI-Voice-1 enables custom voice generation from minimal audio samples with 60-second generation capability in a single second, supporting voice agent development
  • ▸MAI-Image-2 doubles generation speed with improved quality for creative professionals, demonstrated by early enterprise adoption from WPP and rollout across Copilot, Bing, and PowerPoint
Sources:
Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗
Hacker Newshttps://microsoft.ai/news/state-of-the-art-speech-recognition-with-mai-transcribe-1/↗

Summary

Microsoft AI has announced three new multimodal AI models now available in Microsoft Foundry: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. MAI-Transcribe-1 delivers state-of-the-art speech-to-text transcription across 25 languages with 2.5x faster batch processing than existing Azure offerings, starting at $0.36 per hour. MAI-Voice-1, the company's top-tier voice generation model, can now create custom voices from just a few seconds of audio and generate 60 seconds of speech in a single second, priced at $22 per 1M characters.

MAI-Image-2 represents a significant performance upgrade with at least 2x faster generation times on Foundry and Copilot while maintaining quality, and is gaining traction with enterprise partners including WPP, one of the world's largest marketing groups. All three models emphasize competitive pricing and efficiency, with Microsoft positioning them as superior alternatives to competitors in terms of speed, quality, and cost. The models are designed with human-centric principles, optimizing for natural communication and real-world use cases including creative professionals, developers, and enterprise applications.

  • All three models emphasize competitive pricing and efficiency, addressing quality-speed-cost tradeoffs that Microsoft claims outperform competitors

Editorial Opinion

Microsoft's simultaneous launch of three multimodal models demonstrates a comprehensive strategy to compete across speech, voice, and image AI spaces. The emphasis on speed, quality, and affordability—particularly the claim of better performance than competitors at lower cost—positions these models attractively for enterprise adoption. However, the comparison claims warrant scrutiny, and the actual real-world performance will ultimately determine whether MAI models live up to the promise of being genuinely superior across all three dimensions simultaneously.

Computer VisionNatural Language Processing (NLP)Generative AIMultimodal AISpeech & AudioProduct Launch

More from Microsoft

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Launches $2.5B Frontier Company for Enterprise AI Deployments

2026-07-02
MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Project Aion' Reveals Radical Copilot-First OS Without Start Menu

2026-07-02

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
OpenAIOpenAI
INDUSTRY REPORT

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

2026-07-04
PangramPangram
INDUSTRY REPORT

Literary Prize Scandal Exposes Limitations of AI Detection Tools

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us