NVIDIA AI-Q Achieves Top Performance on DeepResearch Benchmarks I and II

Key Takeaways

▸NVIDIA AI-Q achieved #1 ranking on both DeepResearch Bench I (55.95) and Bench II (54.50), demonstrating superior performance in research agent evaluation
▸The system uses a multi-agent architecture with planner, researcher, and orchestrator components built on NVIDIA NeMo Agent Toolkit and Nemotron 3 LLMs
▸AI-Q is fully open and modular, enabling enterprises to own, inspect, and customize the architecture for their specific use cases

Source:

Hacker Newshttps://huggingface.co/blog/nvidia/how-nvidia-won-deepresearch-bench↗

Summary

NVIDIA's AI-Q deep research agent has achieved the #1 ranking on both DeepResearch Bench (55.95) and DeepResearch Bench II (54.50), the leading benchmarks for evaluating deep research agents. This accomplishment demonstrates that an open, portable, and developer-accessible architecture can deliver state-of-the-art agentic research capabilities.

AI-Q is an open blueprint for constructing AI agents that reason over enterprise and web data to generate well-cited research responses. The system features a fully modular architecture that enterprises can own, inspect, and customize for specific use cases. The architecture leverages a multi-agent design consisting of a planner, researcher, and orchestrator, all built on NVIDIA's NeMo Agent Toolkit and powered by fine-tuned Nemotron 3 Super models, with optional ensemble and report refinement capabilities.

Winning both benchmarks simultaneously is significant because they evaluate research agents differently but complementarily. DeepResearch Bench I measures report quality dimensions including comprehensiveness, depth of insight, instruction-following, and readability, while DeepResearch Bench II uses 70+ fine-grained binary rubrics to assess information retrieval, analysis synthesis, and presentation clarity. The dual victory confirms that AI-Q produces both polished, well-structured reports and retrieves and reasons over information with granular factual correctness.

Dual benchmark success indicates the system excels at both report quality and factual correctness with granular analytical rigor

Editorial Opinion

NVIDIA's sweep of both DeepResearch benchmarks validates an important design philosophy: that open, modular architectures using accessible models and tooling can achieve state-of-the-art agentic AI performance. The combination of transparent reasoning (through multi-step planning), specialized agent roles, and fine-tuned models represents a compelling alternative to closed, monolithic systems. This result could accelerate enterprise adoption of AI research agents by demonstrating that transparency and customization don't require sacrificing performance.

NVIDIA

RESEARCH NVIDIA2026-03-12

NVIDIA AI-Q Achieves Top Performance on DeepResearch Benchmarks I and II

Key Takeaways

▸NVIDIA AI-Q achieved #1 ranking on both DeepResearch Bench I (55.95) and Bench II (54.50), demonstrating superior performance in research agent evaluation
▸The system uses a multi-agent architecture with planner, researcher, and orchestrator components built on NVIDIA NeMo Agent Toolkit and Nemotron 3 LLMs
▸AI-Q is fully open and modular, enabling enterprises to own, inspect, and customize the architecture for their specific use cases

Source:

Hacker Newshttps://huggingface.co/blog/nvidia/how-nvidia-won-deepresearch-bench↗

Summary

Dual benchmark success indicates the system excels at both report quality and factual correctness with granular analytical rigor

Editorial Opinion

NVIDIA's sweep of both DeepResearch benchmarks validates an important design philosophy: that open, modular architectures using accessible models and tooling can achieve state-of-the-art agentic AI performance. The combination of transparent reasoning (through multi-step planning), specialized agent roles, and fine-tuned models represents a compelling alternative to closed, monolithic systems. This result could accelerate enterprise adoption of AI research agents by demonstrating that transparency and customization don't require sacrificing performance.

NVIDIA AI-Q Achieves Top Performance on DeepResearch Benchmarks I and II

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

China Bans Nvidia RTX 5090D V2 During CEO Huang's Visit, Escalating AI Hardware Trade War

GTAP Enables Transparent Remote GPU Access: Ollama Runs on MacBook with Remote Blackwell GPU

Researchers Discover Critical Confused Deputy Vulnerabilities in AI Accelerators Affecting 100+ Million Devices

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

NVIDIA AI-Q Achieves Top Performance on DeepResearch Benchmarks I and II

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

China Bans Nvidia RTX 5090D V2 During CEO Huang's Visit, Escalating AI Hardware Trade War

GTAP Enables Transparent Remote GPU Access: Ollama Runs on MacBook with Remote Blackwell GPU

Researchers Discover Critical Confused Deputy Vulnerabilities in AI Accelerators Affecting 100+ Million Devices

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale