NVIDIA launches Nemotron 3 Super with 5x higher throughput for agentic AI

On March 11, 2026, NVIDIA introduced Nemotron 3 Super, a new open model aimed squarely at agentic AI systems. The company describes it as a 120-billion-parameter hybrid mixture-of-experts model with 12 billion active parameters, optimized for NVIDIA Blackwell. Rather than pitch it as a general-purpose chatbot upgrade, NVIDIA focused on a specific pain point for production agents: the cost and latency that show up when workflows involve long reasoning chains, heavy tool use, and repeated context replay.

NVIDIA argues that multi-agent systems suffer from “context explosion,” where each interaction forces the stack to resend histories, tool outputs, and intermediate reasoning. In that setting, throughput and context handling matter just as much as raw model quality. Nemotron 3 Super is NVIDIA’s answer to that problem, with a 1-million-token context window and a design that the company says preserves full workflow state in memory for longer-running tasks.

What NVIDIA is claiming

According to NVIDIA, Nemotron 3 Super delivers up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. The company also highlights high-accuracy tool calling, which is especially important in agent stacks that need to navigate large function libraries without introducing execution errors. NVIDIA says the model is being released with open weights under a permissive license, making it available for customization on workstations, in data centers, or in the cloud.

The distribution story is also notable. NVIDIA says Nemotron 3 Super can be accessed through build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. The company also pointed to early ecosystem support from software agent vendors such as CodeRabbit, Factory, and Greptile, as well as research and life-science groups using the model for literature search, data science, and molecular understanding.

Why it matters

The release matters because enterprise agent builders are now looking for open models that can balance throughput, long context, and tool reliability without forcing everything into a closed vendor stack. A 1-million-token context window will not fix every agent failure, but it can reduce the number of times a system has to compress, summarize, or discard state during long workflows. In parallel, better tool calling lowers the risk that an agent chooses the wrong action in high-stakes environments.

For developers, Nemotron 3 Super is less about headline parameter count and more about system economics. If NVIDIA’s throughput and accuracy claims hold up in downstream testing, the model could become attractive for teams building autonomous research, code-review, security, and enterprise workflow agents that need open weights and predictable deployment paths.

Source

NVIDIA launches Nemotron 3 Super with 5x higher throughput for agentic AI

What NVIDIA is claiming

Why it matters

Related Articles

NVIDIA opens a 30B omni model with 256K context and 9.2x video capacity

NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI

LocalLLaMA Flags a Nemotron License Update That Reduces Friction for Derivative Use

Comments (0)

Leave a Comment

Related Articles

NVIDIA opens a 30B omni model with 256K context and 9.2x video capacity

NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI
LLM Mar 13, 2026 2 min read

LocalLLaMA Flags a Nemotron License Update That Reduces Friction for Derivative Use
LLM Reddit Mar 15, 2026 2 min read