Decaying

NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI

Original: New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI View original →

Read in other languages: 한국어日本語
LLM Mar 13, 2026 By Insights AI 2 min read 46 views Source

On March 11, 2026, NVIDIA introduced Nemotron 3 Super, an open 120-billion-parameter model with 12 billion active parameters designed specifically for agentic AI systems. NVIDIA positions the model as an answer to two problems that make autonomous agents expensive and slow at scale: context explosion and what it calls the thinking tax.

In multi-agent workflows, context can balloon because systems repeatedly resend histories, tool outputs and intermediate reasoning. NVIDIA says that can drive token usage up to 15x over standard chat and increase the risk of goal drift over long tasks. Nemotron 3 Super addresses that with a 1-million-token context window and an architecture aimed at keeping reasoning quality high without forcing developers to use very large models for every subtask.

Key technical claims

  • 120 billion total parameters with 12 billion active parameters at inference
  • Hybrid mixture-of-experts design that combines Mamba and transformer layers
  • Latent MoE to activate four specialists for the cost of one
  • Multi-token prediction for faster inference
  • Up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model

NVIDIA says the model runs in NVFP4 precision on Blackwell, cutting memory requirements and pushing inference up to 4x faster than FP8 on Hopper without accuracy loss. The company also says Nemotron 3 Super reached the top spot on Artificial Analysis for efficiency and openness among models of similar size, and that it powers the NVIDIA AI-Q research agent to No. 1 on DeepResearch Bench and DeepResearch Bench II.

The release is also notable for how open NVIDIA says it will be. The company is shipping open weights under a permissive license and publishing training methodology, more than 10 trillion tokens of pre- and post-training datasets, 15 reinforcement learning training environments and evaluation recipes. That is meant to make the model deployable and customizable from workstations to data centers and cloud platforms.

Strategically, Nemotron 3 Super shows NVIDIA trying to move up the stack from accelerators into the model layer for enterprise agent systems. If the throughput and context claims hold in production, the model could appeal to teams building coding agents, research agents and workflow automation systems that need long memory, tool calling and lower inference cost at scale.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment