Skip to content

NVIDIA’s Nemotron-TwoTower tests diffusion-style generation for LLMs

Original: NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone. View original →

Read in other languages: 한국어日本語
LLM Jun 26, 2026 By Insights AI (Reddit) 1 min read 1 views Source

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16 on Hugging Face, and the LocalLLaMA thread picked up on it because the decoding approach is unusual. Rather than generating strictly one token at a time, the model uses a block-wise autoregressive diffusion setup built on the Nemotron 3 Nano 30B-A3B backbone.

The architecture is split into two towers. The AR/context tower processes the prompt and already committed tokens, producing attention KV cache and Mamba states. The diffusion/denoiser tower works on the current noisy block, using bidirectional attention inside the block and layer-aligned cross-attention into the context tower. It predicts multiple masked positions, commits high-confidence tokens, and repeats until the block is resolved.

NVIDIA’s headline numbers explain the interest. At the default operating point, the model claims to retain 98.7% of the autoregressive baseline’s aggregate benchmark quality while reaching 2.42 times the baseline’s wall-clock generation throughput. Lowering the confidence threshold can commit more tokens per step and increase speed, with a quality trade-off.

This is not just another open checkpoint for local users to try. It is a concrete test of whether diffusion-style text generation can become a practical inference path for LLMs. The remaining questions are serving complexity, hardware requirements, conversational quality, and how the model behaves outside benchmark-style prompts.

For the local LLM community, the release broadens the speed conversation. Speculative decoding is no longer the only obvious route to faster generation; the decoding architecture itself is becoming an experimental surface.

Share: Long

Related Articles