Microsoft Copilot Researcher Introduces Multi-Model Intelligence with Critique and Council Features

Key Takeaways

▸Researcher now features Critique, a multi-model deep research system that separates generation from evaluation using models from Anthropic and OpenAI, delivering superior accuracy compared to single-model approaches
▸Critique achieves +7.0 point improvement on the DRACO benchmark and outperforms Perplexity Deep Research by 13.88%, demonstrating best-in-class deep research quality
▸Council allows users to compare multiple model responses side-by-side with detailed insights on agreement points and divergences across different AI models

Source:

Hacker Newshttps://techcommunity.microsoft.com/blog/microsoft365copilotblog/introducing-multi-model-intelligence-in-researcher/4506011↗

Summary

Microsoft has announced significant enhancements to Researcher, its deep research agent within Microsoft 365 Copilot, introducing two new multi-model capabilities: Critique and Council. Critique employs a dual-model architecture that separates generation from evaluation, combining models from frontier AI labs including Anthropic and OpenAI. One model handles planning, retrieval, and initial draft creation, while a second model acts as an expert reviewer, focusing on validation and refinement before producing the final report. This approach has demonstrated substantial performance improvements on the DRACO benchmark, achieving a +7.0 point improvement and outperforming competing systems by 13.88%.

Council brings a complementary feature that displays multiple model responses side-by-side within the Researcher experience, along with a cover letter that highlights areas of agreement, divergence, and unique insights from each model. The Critique system incorporates rubric-based evaluation similar to academic and professional research workflows, focusing on source reliability assessment, report completeness, and strict evidence grounding enforcement. By emphasizing evaluation as much as generation, the architecture creates a powerful feedback loop designed to enhance factual accuracy, analytical breadth, and overall presentation quality.

The Critique architecture implements academic-style peer review processes with focus on source reliability, report completeness, and strict evidence grounding

Editorial Opinion

Microsoft's move to implement multi-model intelligence in Researcher represents a thoughtful advancement in AI-assisted research, moving beyond the single-model paradigm that has dominated the space. By leveraging both generative and evaluative capabilities through separate models, Microsoft has created a system that mirrors established academic review practices while leveraging the strengths of multiple frontier models. The substantial performance gains on the DRACO benchmark suggest that this architecture addresses real quality gaps in research synthesis, though the long-term value will depend on how well users integrate these insights into their actual research workflows.

Microsoft Copilot Researcher Introduces Multi-Model Intelligence with Critique and Council Features

Key Takeaways

▸Researcher now features Critique, a multi-model deep research system that separates generation from evaluation using models from Anthropic and OpenAI, delivering superior accuracy compared to single-model approaches
▸Critique achieves +7.0 point improvement on the DRACO benchmark and outperforms Perplexity Deep Research by 13.88%, demonstrating best-in-class deep research quality
▸Council allows users to compare multiple model responses side-by-side with detailed insights on agreement points and divergences across different AI models

Summary

The Critique architecture implements academic-style peer review processes with focus on source reliability, report completeness, and strict evidence grounding

Editorial Opinion

Microsoft's move to implement multi-model intelligence in Researcher represents a thoughtful advancement in AI-assisted research, moving beyond the single-model paradigm that has dominated the space. By leveraging both generative and evaluative capabilities through separate models, Microsoft has created a system that mirrors established academic review practices while leveraging the strengths of multiple frontier models. The substantial performance gains on the DRACO benchmark suggest that this architecture addresses real quality gaps in research synthesis, though the long-term value will depend on how well users integrate these insights into their actual research workflows.

Microsoft Copilot Researcher Introduces Multi-Model Intelligence with Critique and Council Features

Key Takeaways

Summary

Editorial Opinion

More from Microsoft

AI Red Teaming Agents Transform LLM Security Testing with Automated Assessment

GitHub Copilot Shifts to Usage-Based Billing Starting June 1, 2026

Microsoft Releases Comprehensive Guidelines for Human-AI Interaction Based on 20+ Years of Research

Comments

Suggested

ByteDance Open-Sources Lance: A Unified 3B Multimodal Model for Image, Video, and Editing

Anthropic's Cheaper Haiku Model Outperforms Sonnet in Agent Task Benchmark

Google's Compute Crunch Drives Top AI Researchers to Launch Startups

Microsoft Copilot Researcher Introduces Multi-Model Intelligence with Critique and Council Features

Key Takeaways

Summary

Editorial Opinion

More from Microsoft

AI Red Teaming Agents Transform LLM Security Testing with Automated Assessment

GitHub Copilot Shifts to Usage-Based Billing Starting June 1, 2026

Microsoft Releases Comprehensive Guidelines for Human-AI Interaction Based on 20+ Years of Research

Comments

Suggested

ByteDance Open-Sources Lance: A Unified 3B Multimodal Model for Image, Video, and Editing

Anthropic's Cheaper Haiku Model Outperforms Sonnet in Agent Task Benchmark

Google's Compute Crunch Drives Top AI Researchers to Launch Startups