LLM Reddit 4d ago 2 min read
A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.