r/LocalLLaMA Flags NVIDIA’s Nemotron-Cascade-2-30B-A3B as an Open 30B MoE Reasoning Model
Original: Nemotron Cascade 2 30B A3B View original →
r/LocalLLaMA spent March 20, 2026 discussing NVIDIA’s Nemotron-Cascade-2-30B-A3B, with the thread reaching 93 points and 37 comments. The appeal is straightforward: an open 30B mixture-of-experts model with only 3B activated parameters, plus benchmark claims large enough to matter to people who actually run models locally or on constrained infrastructure.
The Hugging Face card describes Nemotron-Cascade-2-30B-A3B as a post-trained version of Nemotron-3-Nano-30B-A3B-Base. NVIDIA presents it as a dual-mode model that can operate in both thinking and instruct modes. The chat template uses ChatML-style formatting, with reasoning enclosed in <think> tags and a documented path for switching to non-reasoning mode by prepending an empty <think></think> block.
Why the model stands out
The headline numbers are ambitious. NVIDIA reports gold-medal-level results on the 2025 IMO and IOI, plus strong scores on LiveCodeBench, ArenaHard v2, IFBench, and several math benchmarks. The card also includes explicit sampling guidance, tool-response formatting, and multi-turn prompting details, which makes the release more operational than a benchmark-only announcement.
- The model has 30B total parameters but only 3B activated parameters, a notable efficiency profile for open deployment.
- Reported results include 35 points on IMO 2025, 439.3 on IOI 2025, 87.2 on LiveCodeBench v6, and 83.5 on ArenaHard v2 average.
- The same card shows a more mixed picture on some long-context and agentic evaluations, which gives practitioners a more realistic view of the tradeoffs.
That balance is exactly why the LocalLLaMA thread matters. The community is less interested in a polished launch narrative than in whether a new open model offers a useful speed-to-capability ratio. Nemotron-Cascade-2-30B-A3B looks notable because it pairs open distribution with explicit reasoning controls and benchmark depth, giving developers another serious option in the open-model stack.
Sources: r/LocalLLaMA thread, Hugging Face model card.
Related Articles
A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.
On March 11, 2026, NVIDIA introduced Nemotron 3 Super, an open 120-billion-parameter hybrid MoE model with 12 billion active parameters. NVIDIA says the model combines a 1-million-token context window, high-accuracy tool calling, and up to 5x higher throughput for agentic AI workloads.
A LocalLLaMA thread amplified Phoronix coverage of GreenBoost, an experimental GPLv2 Linux module that adds a multi-tier memory path for NVIDIA GPUs. The design pairs a kernel module with a CUDA shim so large allocations can spill from limited on-card vRAM into pinned system RAM and NVMe-backed storage without modifying CUDA applications.
Comments (0)
No comments yet. Be the first to comment!