A Gemma 4 26B User Pushes Local Context to 245K Tokens

What the post claims

This r/LocalLLaMA post, which had 161 score and 71 comments on April 12, 2026, documents an aggressive long-context stress test for Gemma 4 26B A4B. The author says they packed the context with Reddit posts, documentation, and raw llama.cpp files to push VRAM usage and retrieval behavior, then checked whether the model could still recover specific details accurately. Their headline result is that the model remained usable at 245,283 out of 262,144 context tokens, or roughly 94% of the configured window.

What makes the post more useful than a generic boast is that it also describes where the model broke. According to the author, once the session moved beyond 100K context, Gemma sometimes fell into self-questioning loops and kept extending its own reasoning instead of delivering a clean answer. Lowering temperature and raising repeat penalty to 1.17 or 1.18 reportedly improved stability, and the author says the model could then retrieve a specific user statement from the oversized context within about two to five seconds.

Practical settings shared in the thread

The setup used a 262144 context size and 99 GPU layers.
Sampling settings included top_p 0.95, top_k 40, min_p 0.05, and repeat_penalty 1.17.
Batch and microbatch were both set to 512, with 2048 MB of cache RAM.
The author says they were using the latest llama.cpp build and the newest Unsloth GGUF release available at the time.

Why the report matters

This is still an anecdotal community report, not a formal benchmark with reproducibility guarantees. Even so, it captures the kind of operational detail local-model users care about most: where long-context behavior starts to degrade, which tuning knobs reduced looping, and how much of the advertised context window remains practically useful. In a market full of headline context numbers, those implementation notes are often more valuable than the headline itself.

Original source: r/LocalLLaMA post.

A Gemma 4 26B User Pushes Local Context to 245K Tokens

What the post claims

Practical settings shared in the thread

Why the report matters

Related Articles

r/LocalLLaMA: Qwen 3.5 27B Hits ~2000 TPS in a Document-Classification Setup

LocalLLaMA warns against judging Gemma 4 too early while llama.cpp fixes are still landing

Reddit Says Gemma 4 on llama.cpp Is Finally Stable, With Caveats

Related Articles

r/LocalLLaMA: Qwen 3.5 27B Hits ~2000 TPS in a Document-Classification Setup
LLM Reddit Mar 15, 2026 2 min read

LocalLLaMA warns against judging Gemma 4 too early while llama.cpp fixes are still landing
LLM Reddit Apr 5, 2026 1 min read

Reddit Says Gemma 4 on llama.cpp Is Finally Stable, With Caveats
LLM Reddit Apr 9, 2026 2 min read