BotBeat
...
← Back

> ▌

MicrosoftMicrosoft
PRODUCT LAUNCHMicrosoft2026-04-02

Microsoft AI Announces Three Foundational Multimodal Models with Competitive Pricing

Key Takeaways

  • ▸Microsoft launches three multimodal foundational models (transcription, voice, and video generation) through its AI research division
  • ▸Models emphasize cost-competitiveness against OpenAI and Google offerings as a primary market differentiator
  • ▸Release reflects Microsoft's dual strategy: maintaining OpenAI partnership while independently developing its own superintelligence research capabilities
Source:
Hacker Newshttps://techcrunch.com/2026/04/02/microsoft-takes-on-ai-rivals-with-three-new-foundational-models/↗

Summary

Microsoft AI (MAI) announced the release of three foundational AI models designed to generate text, voice, and images, marking the company's continued expansion into multimodal AI development. The new models include MAI-Transcribe-1, a speech-to-text model supporting 25 languages and running 2.5 times faster than Microsoft's Azure Fast offering; MAI-Voice-1, an audio generation model capable of producing 60 seconds of audio in one second with custom voice options; and MAI-Image-2, a video-generating model. All three models are now available on Microsoft Foundry, with the transcription and voice models also accessible through MAI Playground.

The release underscores Microsoft's strategy to build a competitive AI stack independent of its OpenAI partnership, while maintaining that strategic relationship. Developed by Microsoft's MAI Superintelligence team under CEO Mustafa Suleyman, these models emphasize a "Humanist AI" approach centered on practical human communication. Significantly, Microsoft is positioning cost-effectiveness as a key competitive advantage, with pricing starting at $0.36 per hour for transcription, $22 per million characters for voice generation, and $5-$33 per million tokens for image generation.

  • Models available on Microsoft Foundry platform with phased rollout to integrated Microsoft products and experiences

Editorial Opinion

Microsoft's multi-model release demonstrates a sophisticated competitive strategy—building proprietary AI infrastructure while leveraging its OpenAI partnership. The emphasis on cost-effectiveness is compelling and could resonate with enterprise customers, though the real test will be performance parity with established competitors. The transparent pricing structure and availability across multiple platforms suggest Microsoft is serious about democratizing access to these capabilities.

Natural Language Processing (NLP)Generative AIMultimodal AISpeech & AudioProduct Launch

More from Microsoft

MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Launches Comprehensive Agent Framework for Building and Orchestrating AI Agents

2026-04-04
MicrosoftMicrosoft
POLICY & REGULATION

Microsoft's Own Terms Reveal Copilot Is 'For Entertainment Purposes Only' and Cannot Be Trusted for Important Decisions

2026-04-03
MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft AI Announces Three New Multimodal Models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2

2026-04-03

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
PerplexityPerplexity
POLICY & REGULATION

Perplexity's 'Incognito Mode' Called a 'Sham' in Class Action Lawsuit Over Data Sharing with Google and Meta

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us