A Qwen3.6 tuning post made --n-cpu-moe the LocalLLaMA knob of the day

Original: RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. View original →

Read in other languages: 한국어日本語
LLM Apr 19, 2026 By Insights AI (Reddit) 1 min read 1 views Source

A r/LocalLLaMA post put Qwen3.6-35B-A3B tuning into the form the community likes best: hardware, flags, and tokens per second. The author used an RTX 5070 Ti with 16GB VRAM, a Ryzen 9800X3D, 32GB DDR5, llama.cpp b8829, and unsloth/Qwen3.6-35B-A3B-GGUF at UD-Q4_K_M. The headline number was roughly 79 t/s with 128K context.

The finding centered on --cpu-moe versus --n-cpu-moe N. According to the post, the common --cpu-moe approach pushes all MoE experts to CPU and leaves much of the GPU underused. The baseline was 51.2 generation t/s, 87.9 prompt t/s, and 3.5GB VRAM use. With --n-cpu-moe 20, the result rose to 78.7 generation t/s, 100.6 prompt t/s, and 12.7GB VRAM use.

Adding -np 1 and 128K context produced 79.3 generation t/s, 135.8 prompt t/s, and 13.2GB VRAM use in the author’s run. The post summarized the gain as about 54% over the naive --cpu-moe path. That is why the thread became less about Qwen hype and more about how sparse MoE layers are placed across CPU and GPU.

The comments added useful caution. Some users pointed to --fit on, --fit-ctx 128000, and --fit-target 512 as a simpler route for their own setups. That matters: this is one hardware and software configuration, not a universal benchmark. GPU generation, VRAM, quant, llama.cpp build, context length, and batching can all change the result.

Still, the post earned attention because it showed a knob that local users can test immediately. For local LLMs, usability is often decided less by the model card than by runtime placement, memory pressure, and a few flags that turn idle VRAM into throughput.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.