NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI
Original: New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI View original →
On March 11, 2026, NVIDIA introduced Nemotron 3 Super, an open 120-billion-parameter model with 12 billion active parameters designed specifically for agentic AI systems. NVIDIA positions the model as an answer to two problems that make autonomous agents expensive and slow at scale: context explosion and what it calls the thinking tax.
In multi-agent workflows, context can balloon because systems repeatedly resend histories, tool outputs and intermediate reasoning. NVIDIA says that can drive token usage up to 15x over standard chat and increase the risk of goal drift over long tasks. Nemotron 3 Super addresses that with a 1-million-token context window and an architecture aimed at keeping reasoning quality high without forcing developers to use very large models for every subtask.
Key technical claims
- 120 billion total parameters with 12 billion active parameters at inference
- Hybrid mixture-of-experts design that combines Mamba and transformer layers
- Latent MoE to activate four specialists for the cost of one
- Multi-token prediction for faster inference
- Up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model
NVIDIA says the model runs in NVFP4 precision on Blackwell, cutting memory requirements and pushing inference up to 4x faster than FP8 on Hopper without accuracy loss. The company also says Nemotron 3 Super reached the top spot on Artificial Analysis for efficiency and openness among models of similar size, and that it powers the NVIDIA AI-Q research agent to No. 1 on DeepResearch Bench and DeepResearch Bench II.
The release is also notable for how open NVIDIA says it will be. The company is shipping open weights under a permissive license and publishing training methodology, more than 10 trillion tokens of pre- and post-training datasets, 15 reinforcement learning training environments and evaluation recipes. That is meant to make the model deployable and customizable from workstations to data centers and cloud platforms.
Strategically, Nemotron 3 Super shows NVIDIA trying to move up the stack from accelerators into the model layer for enterprise agent systems. If the throughput and context claims hold in production, the model could appeal to teams building coding agents, research agents and workflow automation systems that need long memory, tool calling and lower inference cost at scale.
Related Articles
NVIDIA AI Developer introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter hybrid MoE model with 12B active parameters and a native 1M-token context window. NVIDIA says the model targets agentic workloads with up to 5x higher throughput than the previous Nemotron Super model.
A March 15, 2026 LocalLLaMA post pointed to Hugging Face model-card commits and NVIDIA license pages showing Nemotron Super 3 models moving from the older NVIDIA Open Model License text to the newer NVIDIA Nemotron Open Model License.
A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.
Comments (0)
No comments yet. Be the first to comment!