NVIDIA Releases Nemotron 3 Super: Open-Weight 120B Model for Autonomous Multi-Agent Systems

Key Takeaways

▸Nemotron 3 Super is optimized specifically for multi-agent autonomous systems, addressing efficiency and context challenges endemic to agentic AI
▸Hybrid Mamba-Transformer MoE architecture achieves 5x throughput over previous Super while maintaining a 1M-token context window for agent memory
▸Open-source release with weights and datasets enables developer customization and on-premises deployment

Source:

Hacker Newshttps://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/↗

Summary

NVIDIA has announced Nemotron 3 Super, a 120B total parameter open-source language model with 12B active parameters specifically designed for autonomous multi-agent AI systems. The release directly addresses two critical pain points: the 'thinking tax' of expensive reasoning models for every sub-task and 'context explosion' where agents lose alignment through token accumulation. With a native 1M-token context window and 5x throughput improvement over the previous Nemotron Super, the model balances reasoning depth with practical efficiency for complex applications like software development and cybersecurity automation.

The model features architectural innovations that push efficiency boundaries: a latent mixture-of-experts (MoE) that activates 4x more specialist experts at constant inference cost through token compression, multi-token prediction (MTP) for faster long-sequence generation with built-in speculative decoding, and a hybrid Mamba-Transformer backbone combining state space models for linear-time sequence processing with Transformer layers for precise reasoning. NVIDIA's NVFP4 pretraining optimization for Blackwell GPUs cuts memory requirements and delivers 4x inference speedup on B200 hardware versus FP8 on H100. Post-training used reinforcement learning across 21 environment configurations with over 1.2 million rollouts via NVIDIA's NeMo Gym and NeMo RL frameworks.

Fully open with weights, datasets, and recipes, Nemotron 3 Super achieves 85.6% on PinchBench—a new benchmark for evaluating LLMs as autonomous agent brains—making it the highest-scoring open model in its class. Developers can customize and deploy the model on their own infrastructure without vendor lock-in.

NVIDIA-specific optimizations (NVFP4, Blackwell support) deliver 4x speedup on B200 vs. FP8 on H100
Reinforcement learning post-training across 21 environments positions the model for robust autonomous reasoning

Editorial Opinion

Nemotron 3 Super reveals NVIDIA's strategic pivot from general-purpose LLM competition toward productizing the agentic AI stack. By open-sourcing a model explicitly built for multi-agent operational patterns—long context windows, efficient token handling, hardware-optimized inference—NVIDIA is cultivating a broader ecosystem while anchoring it to Blackwell GPUs. This is astute positioning: autonomous agents are computationally intensive, and developers will naturally gravitate toward the latest NVIDIA hardware for practical deployments. The move also signals confidence that the agentic AI era will drive sustained demand for NVIDIA infrastructure.

NVIDIA Releases Nemotron 3 Super: Open-Weight 120B Model for Autonomous Multi-Agent Systems

Key Takeaways

▸Nemotron 3 Super is optimized specifically for multi-agent autonomous systems, addressing efficiency and context challenges endemic to agentic AI
▸Hybrid Mamba-Transformer MoE architecture achieves 5x throughput over previous Super while maintaining a 1M-token context window for agent memory
▸Open-source release with weights and datasets enables developer customization and on-premises deployment

Summary

NVIDIA-specific optimizations (NVFP4, Blackwell support) deliver 4x speedup on B200 vs. FP8 on H100
Reinforcement learning post-training across 21 environments positions the model for robust autonomous reasoning

Editorial Opinion

Nemotron 3 Super reveals NVIDIA's strategic pivot from general-purpose LLM competition toward productizing the agentic AI stack. By open-sourcing a model explicitly built for multi-agent operational patterns—long context windows, efficient token handling, hardware-optimized inference—NVIDIA is cultivating a broader ecosystem while anchoring it to Blackwell GPUs. This is astute positioning: autonomous agents are computationally intensive, and developers will naturally gravitate toward the latest NVIDIA hardware for practical deployments. The move also signals confidence that the agentic AI era will drive sustained demand for NVIDIA infrastructure.

NVIDIA Releases Nemotron 3 Super: Open-Weight 120B Model for Autonomous Multi-Agent Systems

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Expands Jetson Thor Lineup with Cost-Effective T3000 and T2000 Boards

NVIDIA GPUs to Power Nokia's Next-Generation 6G Networks

Nvidia Unveils 6G Radio Unit Chip for AI-Powered Radio Access Networks

Comments

Suggested

Apple Dethrones Nvidia to Regain Title of World's Most Valuable Company

Spotify Removes 75 Million AI-Generated Spam Tracks to Protect Artist Royalties

Undergraduate Rewrites Early Linux Kernel in Rust, Playfully Responding to Torvalds' Fork Challenge

NVIDIA Releases Nemotron 3 Super: Open-Weight 120B Model for Autonomous Multi-Agent Systems

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Expands Jetson Thor Lineup with Cost-Effective T3000 and T2000 Boards

NVIDIA GPUs to Power Nokia's Next-Generation 6G Networks

Nvidia Unveils 6G Radio Unit Chip for AI-Powered Radio Access Networks

Comments

Suggested

Apple Dethrones Nvidia to Regain Title of World's Most Valuable Company

Spotify Removes 75 Million AI-Generated Spam Tracks to Protect Artist Royalties

Undergraduate Rewrites Early Linux Kernel in Rust, Playfully Responding to Torvalds' Fork Challenge