NVIDIA launches Nemotron 3 Super for multi-agent AI workloads

What NVIDIA announced on X

On March 11, 2026, NVIDIA AI Developer introduced Nemotron 3 Super as an open 120B-parameter hybrid Mamba-Transformer MoE model with 12B active parameters at inference. The X post framed the release around a few practical claims that matter for agent builders: a native 1M-token context window, an architecture tuned for compute-efficient multi-agent work, and open weights, datasets, and recipes for customization and deployment.

That positioning makes this more than a benchmark-oriented model refresh. NVIDIA is explicitly targeting the cost structure of long-running agent systems, where context grows quickly, intermediate reasoning piles up, and throughput becomes a gating factor for real-world deployment.

What the official NVIDIA blog adds

NVIDIA’s launch post says Nemotron 3 Super delivers up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model. The company attributes that to a hybrid architecture that combines Mamba layers, transformer reasoning, sparse MoE activation, and multi-token prediction. NVIDIA also says the model is optimized for NVIDIA Blackwell, can run in NVFP4 precision, and is intended to reduce the “context explosion” and “thinking tax” that make multi-agent systems expensive and slow.

NVIDIA says the model reaches the top spot on Artificial Analysis for efficiency and openness among similarly sized models.
The post says Nemotron 3 Super powers NVIDIA AI-Q to No. 1 on DeepResearch Bench and DeepResearch Bench II.
NVIDIA says it is releasing open weights under a permissive license along with methodology, more than 10 trillion tokens of pre- and post-training datasets, 15 reinforcement-learning training environments, and evaluation recipes.

Why this matters for AI teams

The most important operational detail is the combination of long context and sparse activation. A 1M-token window is only useful if teams can actually afford to keep large workflow state in memory. By pairing that window with 12B active parameters, NVIDIA is arguing that developers do not have to choose as sharply between long-horizon context and practical inference cost.

The launch is also a supply-side signal for the open-model ecosystem. If the open weights, training recipes, and evaluation artifacts are genuinely usable, teams building coding agents, research agents, and retrieval-heavy enterprise systems get a model family that is easier to audit, fine-tune, and benchmark against proprietary alternatives. The real test will be whether production users see the claimed speed and reasoning gains on their own agent stacks, but the release is clearly aimed at that exact workload class.

Sources: NVIDIA AI Developer X post, NVIDIA launch blog

NVIDIA launches Nemotron 3 Super for multi-agent AI workloads

What NVIDIA announced on X

What the official NVIDIA blog adds

Why this matters for AI teams

Related Articles

Nemotron 3 Ultra uses 550B MoE design to cut agent costs by 30%

Nemotron 3 Ultra turns agent cost and runtime into NVIDIA’s pitch

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models