BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
PRODUCT LAUNCHMicrosoft2026-04-03

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

  • ▸Three new multimodal AI models (transcription, voice, and image generation) are now available in Microsoft Foundry with significant performance improvements over existing offerings
  • ▸MAI-Transcribe-1 delivers 2.5x faster transcription speeds across 25 languages; MAI-Voice-1 generates realistic speech with custom voice creation; MAI-Image-2 provides 2x faster image generation
  • ▸Models are priced competitively compared to cloud providers and prioritize efficiency gains without sacrificing quality
Source:
Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗

Summary

Microsoft AI has announced three new advanced models available through Microsoft Foundry: MAI-Transcribe-1 for speech-to-text transcription, MAI-Voice-1 for voice generation, and MAI-Image-2 for image generation. MAI-Transcribe-1 delivers state-of-the-art performance across 25 languages with 2.5x faster batch transcription speeds than existing Azure offerings. MAI-Voice-1 enables custom voice creation from just seconds of audio and can generate 60 seconds of speech in a single second, while MAI-Image-2 provides 2x faster image generation with improved quality for creative professionals.

All three models are positioned as offering superior performance compared to competitors at competitive pricing tiers. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per million characters, and MAI-Image-2 at $5 per million tokens for text input and $33 per million tokens for image output. Early enterprise adoption includes WPP, a major marketing and communications group, which is already utilizing MAI-Image-2 for campaign-ready creative work at scale.

  • Early enterprise adoption from WPP demonstrates commercial viability for creative and marketing applications

Editorial Opinion

Microsoft's announcement of these three new models represents a significant push to democratize access to high-quality multimodal AI capabilities through Foundry. The emphasis on balancing speed, quality, and affordability directly challenges competitors by removing the traditional trade-offs developers face. If the performance claims hold up in production environments, this could accelerate enterprise adoption of AI-powered features across voice, transcription, and image generation use cases.

Computer VisionGenerative AIMultimodal AISpeech & AudioProduct Launch

More from Microsoft

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Launches $2.5B Frontier Company for Enterprise AI Deployments

2026-07-02
MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Project Aion' Reveals Radical Copilot-First OS Without Start Menu

2026-07-02

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
OpenAIOpenAI
INDUSTRY REPORT

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

2026-07-04
PangramPangram
INDUSTRY REPORT

Literary Prize Scandal Exposes Limitations of AI Detection Tools

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us