NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI

On March 11, 2026, NVIDIA introduced Nemotron 3 Super, an open 120-billion-parameter model with 12 billion active parameters designed specifically for agentic AI systems. NVIDIA positions the model as an answer to two problems that make autonomous agents expensive and slow at scale: context explosion and what it calls the thinking tax.

In multi-agent workflows, context can balloon because systems repeatedly resend histories, tool outputs and intermediate reasoning. NVIDIA says that can drive token usage up to 15x over standard chat and increase the risk of goal drift over long tasks. Nemotron 3 Super addresses that with a 1-million-token context window and an architecture aimed at keeping reasoning quality high without forcing developers to use very large models for every subtask.

Key technical claims

120 billion total parameters with 12 billion active parameters at inference
Hybrid mixture-of-experts design that combines Mamba and transformer layers
Latent MoE to activate four specialists for the cost of one
Multi-token prediction for faster inference
Up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model

NVIDIA says the model runs in NVFP4 precision on Blackwell, cutting memory requirements and pushing inference up to 4x faster than FP8 on Hopper without accuracy loss. The company also says Nemotron 3 Super reached the top spot on Artificial Analysis for efficiency and openness among models of similar size, and that it powers the NVIDIA AI-Q research agent to No. 1 on DeepResearch Bench and DeepResearch Bench II.

The release is also notable for how open NVIDIA says it will be. The company is shipping open weights under a permissive license and publishing training methodology, more than 10 trillion tokens of pre- and post-training datasets, 15 reinforcement learning training environments and evaluation recipes. That is meant to make the model deployable and customizable from workstations to data centers and cloud platforms.

Strategically, Nemotron 3 Super shows NVIDIA trying to move up the stack from accelerators into the model layer for enterprise agent systems. If the throughput and context claims hold in production, the model could appeal to teams building coding agents, research agents and workflow automation systems that need long memory, tool calling and lower inference cost at scale.

NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI

Key technical claims

Related Articles

Nemotron 3 Ultra uses 550B MoE design to cut agent costs by 30%

Nemotron 3 Ultra turns agent cost and runtime into NVIDIA’s pitch

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models