r/LocalLLaMA Flags NVIDIA’s Nemotron-Cascade-2-30B-A3B as an Open 30B MoE Reasoning Model

Original: Nemotron Cascade 2 30B A3B View original →

Read in other languages: 한국어日本語
LLM Mar 21, 2026 By Insights AI (Reddit) 1 min read 1 views Source

r/LocalLLaMA spent March 20, 2026 discussing NVIDIA’s Nemotron-Cascade-2-30B-A3B, with the thread reaching 93 points and 37 comments. The appeal is straightforward: an open 30B mixture-of-experts model with only 3B activated parameters, plus benchmark claims large enough to matter to people who actually run models locally or on constrained infrastructure.

The Hugging Face card describes Nemotron-Cascade-2-30B-A3B as a post-trained version of Nemotron-3-Nano-30B-A3B-Base. NVIDIA presents it as a dual-mode model that can operate in both thinking and instruct modes. The chat template uses ChatML-style formatting, with reasoning enclosed in <think> tags and a documented path for switching to non-reasoning mode by prepending an empty <think></think> block.

Why the model stands out

The headline numbers are ambitious. NVIDIA reports gold-medal-level results on the 2025 IMO and IOI, plus strong scores on LiveCodeBench, ArenaHard v2, IFBench, and several math benchmarks. The card also includes explicit sampling guidance, tool-response formatting, and multi-turn prompting details, which makes the release more operational than a benchmark-only announcement.

  • The model has 30B total parameters but only 3B activated parameters, a notable efficiency profile for open deployment.
  • Reported results include 35 points on IMO 2025, 439.3 on IOI 2025, 87.2 on LiveCodeBench v6, and 83.5 on ArenaHard v2 average.
  • The same card shows a more mixed picture on some long-context and agentic evaluations, which gives practitioners a more realistic view of the tradeoffs.

That balance is exactly why the LocalLLaMA thread matters. The community is less interested in a polished launch narrative than in whether a new open model offers a useful speed-to-capability ratio. Nemotron-Cascade-2-30B-A3B looks notable because it pairs open distribution with explicit reasoning controls and benchmark depth, giving developers another serious option in the open-model stack.

Sources: r/LocalLLaMA thread, Hugging Face model card.

Share: Long

Related Articles

LLM Reddit 5d ago 2 min read

A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.