#gke - Insights

LLM Reddit Mar 28, 2026 2 min read

LocalLLaMA Follows a 1.1M Tok/s Qwen 3.5 27B Run as vLLM Tuning Becomes the Real Story

A March 26, 2026 r/LocalLLaMA post about serving Qwen 3.5 27B on Google Cloud B200 clusters reached 205 points and 52 comments at crawl time. The linked write-up reports 1,103,941 total tokens per second on 12 nodes after switching from tensor to data parallelism, shrinking context length, enabling FP8 KV cache, and using MTP-1 speculative decoding.

#qwen #vllm #nvidia-b200

104