Ollama brings NVIDIA’s Nemotron-Cascade-2 into local and agent workflows

Original: Nemotron-Cascade-2 is now available to run with Ollama. ollama run nemotron-cascade-2 To run it locally with OpenClaw: ollama launch openclaw --model nemotron-cascade-2 This model from NVIDIA delivers strong reasoning and agentic capabilities on par with models with up to 20x more parameters. View original →

Read in other languages: 한국어日本語
LLM Mar 21, 2026 By Insights AI 2 min read Source

What Ollama announced on X

On March 20, 2026, Ollama said Nemotron-Cascade-2 is now available to run through its local model runtime. The post gives the most direct use case immediately: developers can pull the model with ollama run nemotron-cascade-2 and wire it into agent workflows with commands such as ollama launch openclaw --model nemotron-cascade-2.

That matters because the announcement is not about a closed hosted endpoint. It is about making a large reasoning-oriented NVIDIA model easier to drop into local and semi-local development environments. Ollama’s own framing is aggressive: it says the model offers strong reasoning and agentic performance comparable to systems with far larger parameter counts.

What the official model page confirms

Ollama’s model page describes Nemotron-Cascade-2 as an open 30B MoE model from NVIDIA with 3B activated parameters. The page also says the model supports both thinking and instruct modes, which is important for teams that want one model for deeper reasoning passes as well as lower-latency task execution.

  • The model page marks it as a tools-capable model and exposes launch paths into OpenClaw, Codex, and Claude via Ollama’s launcher integrations.
  • It identifies the main downloadable variant as 30b.
  • The page also says Nemotron-Cascade-2-30B-A3B achieved gold medal performance on the 2025 International Mathematical Olympiad and the International Olympiad in Informatics.

In effect, Ollama is packaging a frontier-style reasoning model into a format that is easier to test in local developer loops, agent shells, and custom tooling stacks without depending on a separate proprietary inference surface.

Why this matters

The local model ecosystem is moving from small convenience models toward serious reasoning systems, and this release is a strong example of that shift. A 30B MoE model with only 3B activated parameters suggests a design optimized for capability without requiring the full runtime cost of a dense model at the same nominal size. That makes it more practical for experimentation and for agent workflows where many calls accumulate quickly.

It also reflects a second industry trend: model value increasingly depends on surrounding workflow support. Ollama is not only listing a model; it is showing how the model fits into tools developers already use for coding and agent orchestration. That shortens the distance between “interesting model release” and “something teams can actually evaluate in their own environment.”

Sources: Ollama X post · Ollama model page

Share: Long

Related Articles

LLM Reddit Mar 2, 2026 1 min read

Alibaba's Qwen team has released Qwen 3.5 Small, a new small dense model in their flagship open-source series. The announcement topped r/LocalLLaMA with over 1,000 upvotes, reflecting the local AI community's enthusiasm for capable small models.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.