Skip to content

GLM-5.1 inference gains came from network topology, not new GPUs

Original: Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild View original →

Read in other languages: 한국어日本語
LLM May 28, 2026 By Insights AI (Reddit) 1 min read 1 views Source

A LocalLLaMA post about Zai’s GLM-5.1 inference cluster drew attention because the gains came from the network layer, not a new model or new GPUs. According to the post, Zai replaced a standard ROFT setup with ZCube, developed with Tsinghua University and HarnetsAI, on a thousand-GPU cluster running GLM-5.1 coding inference. The framing was especially interesting because the GPUs, software stack, and model stayed the same.

The reported production numbers were concrete: switch and optical module costs down 33%, GPU inference throughput up 15%, and first-token P99 tail latency down 40.6%. That is not the usual tradeoff operators expect. Higher network performance often implies more hardware spend. Here, the claim is that a topology change lowered cost while improving throughput and tail latency.

The technical issue is Prefill-Decode disaggregated inference. KV Cache transfers create asymmetric traffic between nodes, and a topology that works for training can map poorly to inference traffic. In the post’s explanation, ROFT’s static rail mapping led to hotspots on particular Leaf switches and PFC backpressure. ZCube removes the Spine layer and uses a flattened complete bipartite interconnect between two switch groups, reducing a class of congestion by design.

The most useful community reaction was that the bottleneck keeps moving lower in the stack. LLM inference optimization is no longer just about weights, quantization, or scheduler tricks. At large scale, the fabric carrying KV Cache traffic can decide both cost and responsiveness. For operators, this is a reminder to profile network topology before assuming the next performance step requires more GPUs.

Reddit discussion

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment