NVIDIA launches Nemotron 3 Super with 5x higher throughput for agentic AI
Original: New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI View original →
On March 11, 2026, NVIDIA introduced Nemotron 3 Super, a new open model aimed squarely at agentic AI systems. The company describes it as a 120-billion-parameter hybrid mixture-of-experts model with 12 billion active parameters, optimized for NVIDIA Blackwell. Rather than pitch it as a general-purpose chatbot upgrade, NVIDIA focused on a specific pain point for production agents: the cost and latency that show up when workflows involve long reasoning chains, heavy tool use, and repeated context replay.
NVIDIA argues that multi-agent systems suffer from “context explosion,” where each interaction forces the stack to resend histories, tool outputs, and intermediate reasoning. In that setting, throughput and context handling matter just as much as raw model quality. Nemotron 3 Super is NVIDIA’s answer to that problem, with a 1-million-token context window and a design that the company says preserves full workflow state in memory for longer-running tasks.
What NVIDIA is claiming
According to NVIDIA, Nemotron 3 Super delivers up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. The company also highlights high-accuracy tool calling, which is especially important in agent stacks that need to navigate large function libraries without introducing execution errors. NVIDIA says the model is being released with open weights under a permissive license, making it available for customization on workstations, in data centers, or in the cloud.
The distribution story is also notable. NVIDIA says Nemotron 3 Super can be accessed through build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. The company also pointed to early ecosystem support from software agent vendors such as CodeRabbit, Factory, and Greptile, as well as research and life-science groups using the model for literature search, data science, and molecular understanding.
Why it matters
The release matters because enterprise agent builders are now looking for open models that can balance throughput, long context, and tool reliability without forcing everything into a closed vendor stack. A 1-million-token context window will not fix every agent failure, but it can reduce the number of times a system has to compress, summarize, or discard state during long workflows. In parallel, better tool calling lowers the risk that an agent chooses the wrong action in high-stakes environments.
For developers, Nemotron 3 Super is less about headline parameter count and more about system economics. If NVIDIA’s throughput and accuracy claims hold up in downstream testing, the model could become attractive for teams building autonomous research, code-review, security, and enterprise workflow agents that need open weights and predictable deployment paths.
Related Articles
NVIDIA AI Developer introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter hybrid MoE model with 12B active parameters and a native 1M-token context window. NVIDIA says the model targets agentic workloads with up to 5x higher throughput than the previous Nemotron Super model.
NVIDIA introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter model built for agentic AI systems. The company says the model tackles long-context cost and reasoning overhead with a 1M-token window, hybrid MoE design and up to 5x higher throughput.
A March 15, 2026 LocalLLaMA post pointed to Hugging Face model-card commits and NVIDIA license pages showing Nemotron Super 3 models moving from the older NVIDIA Open Model License text to the newer NVIDIA Nemotron Open Model License.
Comments (0)
No comments yet. Be the first to comment!