Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

▸Three new multimodal AI models (transcription, voice, and image generation) are now available in Microsoft Foundry with significant performance improvements over existing offerings
▸MAI-Transcribe-1 delivers 2.5x faster transcription speeds across 25 languages; MAI-Voice-1 generates realistic speech with custom voice creation; MAI-Image-2 provides 2x faster image generation
▸Models are priced competitively compared to cloud providers and prioritize efficiency gains without sacrificing quality

Source:

Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗

Summary

Microsoft AI has announced three new advanced models available through Microsoft Foundry: MAI-Transcribe-1 for speech-to-text transcription, MAI-Voice-1 for voice generation, and MAI-Image-2 for image generation. MAI-Transcribe-1 delivers state-of-the-art performance across 25 languages with 2.5x faster batch transcription speeds than existing Azure offerings. MAI-Voice-1 enables custom voice creation from just seconds of audio and can generate 60 seconds of speech in a single second, while MAI-Image-2 provides 2x faster image generation with improved quality for creative professionals.

All three models are positioned as offering superior performance compared to competitors at competitive pricing tiers. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per million characters, and MAI-Image-2 at $5 per million tokens for text input and $33 per million tokens for image output. Early enterprise adoption includes WPP, a major marketing and communications group, which is already utilizing MAI-Image-2 for campaign-ready creative work at scale.

Early enterprise adoption from WPP demonstrates commercial viability for creative and marketing applications

Editorial Opinion

Microsoft's announcement of these three new models represents a significant push to democratize access to high-quality multimodal AI capabilities through Foundry. The emphasis on balancing speed, quality, and affordability directly challenges competitors by removing the traditional trade-offs developers face. If the performance claims hold up in production environments, this could accelerate enterprise adoption of AI-powered features across voice, transcription, and image generation use cases.

Microsoft

PRODUCT LAUNCH Microsoft2026-04-03

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

▸Three new multimodal AI models (transcription, voice, and image generation) are now available in Microsoft Foundry with significant performance improvements over existing offerings
▸MAI-Transcribe-1 delivers 2.5x faster transcription speeds across 25 languages; MAI-Voice-1 generates realistic speech with custom voice creation; MAI-Image-2 provides 2x faster image generation
▸Models are priced competitively compared to cloud providers and prioritize efficiency gains without sacrificing quality

Source:

Hacker Newshttps://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/↗

Summary

Early enterprise adoption from WPP demonstrates commercial viability for creative and marketing applications

Editorial Opinion

Microsoft's announcement of these three new models represents a significant push to democratize access to high-quality multimodal AI capabilities through Foundry. The emphasis on balancing speed, quality, and affordability directly challenges competitors by removing the traditional trade-offs developers face. If the performance claims hold up in production environments, this could accelerate enterprise adoption of AI-powered features across voice, transcription, and image generation use cases.

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

Summary

Editorial Opinion

More from Microsoft

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Microsoft Launches $2.5B Frontier Company for Enterprise AI Deployments

Microsoft's Leaked 'Project Aion' Reveals Radical Copilot-First OS Without Start Menu

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

Literary Prize Scandal Exposes Limitations of AI Detection Tools

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

Key Takeaways

Summary

Editorial Opinion

More from Microsoft

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Microsoft Launches $2.5B Frontier Company for Enterprise AI Deployments

Microsoft's Leaked 'Project Aion' Reveals Radical Copilot-First OS Without Start Menu

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

Literary Prize Scandal Exposes Limitations of AI Detection Tools