Ollama brings NVIDIA’s Nemotron-Cascade-2 into local and agent workflows

What Ollama announced on X

On March 20, 2026, Ollama said Nemotron-Cascade-2 is now available to run through its local model runtime. The post gives the most direct use case immediately: developers can pull the model with ollama run nemotron-cascade-2 and wire it into agent workflows with commands such as ollama launch openclaw --model nemotron-cascade-2.

That matters because the announcement is not about a closed hosted endpoint. It is about making a large reasoning-oriented NVIDIA model easier to drop into local and semi-local development environments. Ollama’s own framing is aggressive: it says the model offers strong reasoning and agentic performance comparable to systems with far larger parameter counts.

What the official model page confirms

Ollama’s model page describes Nemotron-Cascade-2 as an open 30B MoE model from NVIDIA with 3B activated parameters. The page also says the model supports both thinking and instruct modes, which is important for teams that want one model for deeper reasoning passes as well as lower-latency task execution.

The model page marks it as a tools-capable model and exposes launch paths into OpenClaw, Codex, and Claude via Ollama’s launcher integrations.
It identifies the main downloadable variant as 30b.
The page also says Nemotron-Cascade-2-30B-A3B achieved gold medal performance on the 2025 International Mathematical Olympiad and the International Olympiad in Informatics.

In effect, Ollama is packaging a frontier-style reasoning model into a format that is easier to test in local developer loops, agent shells, and custom tooling stacks without depending on a separate proprietary inference surface.

Why this matters

The local model ecosystem is moving from small convenience models toward serious reasoning systems, and this release is a strong example of that shift. A 30B MoE model with only 3B activated parameters suggests a design optimized for capability without requiring the full runtime cost of a dense model at the same nominal size. That makes it more practical for experimentation and for agent workflows where many calls accumulate quickly.

It also reflects a second industry trend: model value increasingly depends on surrounding workflow support. Ollama is not only listing a model; it is showing how the model fits into tools developers already use for coding and agent orchestration. That shortens the distance between “interesting model release” and “something teams can actually evaluate in their own environment.”

Sources: Ollama X post · Ollama model page

Ollama brings NVIDIA’s Nemotron-Cascade-2 into local and agent workflows

What Ollama announced on X

What the official model page confirms

Why this matters

Related Articles

HN Turns the Ollama Backlash Into a Trust Check for Local LLM Tools

NVIDIA ties LLM shape to GPU latency with 128 and 256 alignment rules

Ollama’s MLX Preview Pushes Local LLM Performance on Apple Silicon