r/LocalLLaMA Flags NVIDIA’s Nemotron-Cascade-2-30B-A3B as an Open 30B MoE Reasoning Model

r/LocalLLaMA spent March 20, 2026 discussing NVIDIA’s Nemotron-Cascade-2-30B-A3B, with the thread reaching 93 points and 37 comments. The appeal is straightforward: an open 30B mixture-of-experts model with only 3B activated parameters, plus benchmark claims large enough to matter to people who actually run models locally or on constrained infrastructure.

The Hugging Face card describes Nemotron-Cascade-2-30B-A3B as a post-trained version of Nemotron-3-Nano-30B-A3B-Base. NVIDIA presents it as a dual-mode model that can operate in both thinking and instruct modes. The chat template uses ChatML-style formatting, with reasoning enclosed in <think> tags and a documented path for switching to non-reasoning mode by prepending an empty <think></think> block.

Why the model stands out

The headline numbers are ambitious. NVIDIA reports gold-medal-level results on the 2025 IMO and IOI, plus strong scores on LiveCodeBench, ArenaHard v2, IFBench, and several math benchmarks. The card also includes explicit sampling guidance, tool-response formatting, and multi-turn prompting details, which makes the release more operational than a benchmark-only announcement.

The model has 30B total parameters but only 3B activated parameters, a notable efficiency profile for open deployment.
Reported results include 35 points on IMO 2025, 439.3 on IOI 2025, 87.2 on LiveCodeBench v6, and 83.5 on ArenaHard v2 average.
The same card shows a more mixed picture on some long-context and agentic evaluations, which gives practitioners a more realistic view of the tradeoffs.

That balance is exactly why the LocalLLaMA thread matters. The community is less interested in a polished launch narrative than in whether a new open model offers a useful speed-to-capability ratio. Nemotron-Cascade-2-30B-A3B looks notable because it pairs open distribution with explicit reasoning controls and benchmark depth, giving developers another serious option in the open-model stack.

Sources: r/LocalLLaMA thread, Hugging Face model card.

r/LocalLLaMA Flags NVIDIA’s Nemotron-Cascade-2-30B-A3B as an Open 30B MoE Reasoning Model

Why the model stands out

Related Articles

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models

NVIDIA launches Nemotron 3 Super with 5x higher throughput for agentic AI

LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

Comments (0)

Leave a Comment

Related Articles

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models

NVIDIA launches Nemotron 3 Super with 5x higher throughput for agentic AI

LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe