NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI
Original: New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI View original →
On March 11, 2026, NVIDIA introduced Nemotron 3 Super, an open 120-billion-parameter model with 12 billion active parameters designed specifically for agentic AI systems. NVIDIA positions the model as an answer to two problems that make autonomous agents expensive and slow at scale: context explosion and what it calls the thinking tax.
In multi-agent workflows, context can balloon because systems repeatedly resend histories, tool outputs and intermediate reasoning. NVIDIA says that can drive token usage up to 15x over standard chat and increase the risk of goal drift over long tasks. Nemotron 3 Super addresses that with a 1-million-token context window and an architecture aimed at keeping reasoning quality high without forcing developers to use very large models for every subtask.
Key technical claims
- 120 billion total parameters with 12 billion active parameters at inference
- Hybrid mixture-of-experts design that combines Mamba and transformer layers
- Latent MoE to activate four specialists for the cost of one
- Multi-token prediction for faster inference
- Up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model
NVIDIA says the model runs in NVFP4 precision on Blackwell, cutting memory requirements and pushing inference up to 4x faster than FP8 on Hopper without accuracy loss. The company also says Nemotron 3 Super reached the top spot on Artificial Analysis for efficiency and openness among models of similar size, and that it powers the NVIDIA AI-Q research agent to No. 1 on DeepResearch Bench and DeepResearch Bench II.
The release is also notable for how open NVIDIA says it will be. The company is shipping open weights under a permissive license and publishing training methodology, more than 10 trillion tokens of pre- and post-training datasets, 15 reinforcement learning training environments and evaluation recipes. That is meant to make the model deployable and customizable from workstations to data centers and cloud platforms.
Strategically, Nemotron 3 Super shows NVIDIA trying to move up the stack from accelerators into the model layer for enterprise agent systems. If the throughput and context claims hold in production, the model could appeal to teams building coding agents, research agents and workflow automation systems that need long memory, tool calling and lower inference cost at scale.
Related Articles
Open-model competition is shifting from leaderboard scores to agent operating costs. NVIDIA says Nemotron 3 Ultra is a 550B MoE model with 5x faster inference and up to 30% lower cost for complex agentic tasks.
NVIDIA is packaging a 550B-parameter MoE model with agent tooling instead of treating the model as a standalone release. The pitch is concrete: up to 5x faster inference, up to 30% lower cost, and availability beginning June 4.
A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.