BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
PRODUCT LAUNCHMicrosoft2026-04-03

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

  • ▸Three new multimodal AI models (transcription, voice, and image generation) are now available in Microsoft Foundry with significant performance improvements over existing offerings
  • ▸MAI-Transcribe-1 delivers 2.5x faster transcription speeds across 25 languages; MAI-Voice-1 generates realistic speech with custom voice creation; MAI-Image-2 provides 2x faster image generation
  • ▸Models are priced competitively compared to cloud providers and prioritize efficiency gains without sacrificing quality
Source:
Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗

Summary

Microsoft AI has announced three new advanced models available through Microsoft Foundry: MAI-Transcribe-1 for speech-to-text transcription, MAI-Voice-1 for voice generation, and MAI-Image-2 for image generation. MAI-Transcribe-1 delivers state-of-the-art performance across 25 languages with 2.5x faster batch transcription speeds than existing Azure offerings. MAI-Voice-1 enables custom voice creation from just seconds of audio and can generate 60 seconds of speech in a single second, while MAI-Image-2 provides 2x faster image generation with improved quality for creative professionals.

All three models are positioned as offering superior performance compared to competitors at competitive pricing tiers. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per million characters, and MAI-Image-2 at $5 per million tokens for text input and $33 per million tokens for image output. Early enterprise adoption includes WPP, a major marketing and communications group, which is already utilizing MAI-Image-2 for campaign-ready creative work at scale.

  • Early enterprise adoption from WPP demonstrates commercial viability for creative and marketing applications

Editorial Opinion

Microsoft's announcement of these three new models represents a significant push to democratize access to high-quality multimodal AI capabilities through Foundry. The emphasis on balancing speed, quality, and affordability directly challenges competitors by removing the traditional trade-offs developers face. If the performance claims hold up in production environments, this could accelerate enterprise adoption of AI-powered features across voice, transcription, and image generation use cases.

Computer VisionGenerative AIMultimodal AISpeech & AudioProduct Launch

More from Microsoft

MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Launches Comprehensive Agent Framework for Building and Orchestrating AI Agents

2026-04-04
MicrosoftMicrosoft
POLICY & REGULATION

Microsoft's Own Terms Reveal Copilot Is 'For Entertainment Purposes Only' and Cannot Be Trusted for Important Decisions

2026-04-03
MicrosoftMicrosoft
INDUSTRY REPORT

Microsoft Executives Warn That Agentic AI Is Depleting the Junior Developer Pipeline

2026-04-03

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
SourceHutSourceHut
INDUSTRY REPORT

SourceHut's Git Service Disrupted by LLM Crawler Botnets

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us