BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
PRODUCT LAUNCHMicrosoft2026-04-03

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

  • ▸Three new multimodal AI models (transcription, voice, and image generation) are now available in Microsoft Foundry with significant performance improvements over existing offerings
  • ▸MAI-Transcribe-1 delivers 2.5x faster transcription speeds across 25 languages; MAI-Voice-1 generates realistic speech with custom voice creation; MAI-Image-2 provides 2x faster image generation
  • ▸Models are priced competitively compared to cloud providers and prioritize efficiency gains without sacrificing quality
Source:
Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗

Summary

Microsoft AI has announced three new advanced models available through Microsoft Foundry: MAI-Transcribe-1 for speech-to-text transcription, MAI-Voice-1 for voice generation, and MAI-Image-2 for image generation. MAI-Transcribe-1 delivers state-of-the-art performance across 25 languages with 2.5x faster batch transcription speeds than existing Azure offerings. MAI-Voice-1 enables custom voice creation from just seconds of audio and can generate 60 seconds of speech in a single second, while MAI-Image-2 provides 2x faster image generation with improved quality for creative professionals.

All three models are positioned as offering superior performance compared to competitors at competitive pricing tiers. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per million characters, and MAI-Image-2 at $5 per million tokens for text input and $33 per million tokens for image output. Early enterprise adoption includes WPP, a major marketing and communications group, which is already utilizing MAI-Image-2 for campaign-ready creative work at scale.

  • Early enterprise adoption from WPP demonstrates commercial viability for creative and marketing applications

Editorial Opinion

Microsoft's announcement of these three new models represents a significant push to democratize access to high-quality multimodal AI capabilities through Foundry. The emphasis on balancing speed, quality, and affordability directly challenges competitors by removing the traditional trade-offs developers face. If the performance claims hold up in production environments, this could accelerate enterprise adoption of AI-powered features across voice, transcription, and image generation use cases.

Computer VisionGenerative AIMultimodal AISpeech & AudioProduct Launch

More from Microsoft

MicrosoftMicrosoft
RESEARCH

Microsoft Releases Comprehensive Guidelines for Human-AI Interaction Based on 20+ Years of Research

2026-05-20
MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Agent 365: The $15/user Governance Layer for Autonomous Enterprise AI

2026-05-20
MicrosoftMicrosoft
INDUSTRY REPORT

Microsoft's Durabletask Package on PyPI Compromised in Major Supply Chain Attack

2026-05-19

Comments

Suggested

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
Helmholtz MunichHelmholtz Munich
RESEARCH

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us