Two Strix Halo boards as a vLLM cluster: the hard part is RDMA
Original: AMD Strix Halo RDMA Cluster Setup Guide View original →
The AMD Strix Halo RDMA Cluster Setup Guide captures a practical shift in local LLM work. The goal is not simply to run a model on one small machine, but to connect two Framework Desktop Mainboards with AMD Ryzen AI Max 300-series chips, 128GB of unified memory each, and Intel E810 100GbE NICs so vLLM can serve a model with tensor parallelism across both nodes.
The central detail is RDMA. The guide explains the serving stack as Ray for the control plane, RCCL for AMD collective communication, and RoCE v2 over Ethernet for the data plane. In tensor parallelism, the nodes exchange partial results after every layer, so latency matters as much as raw bandwidth. The guide contrasts roughly 70-100 microseconds over TCP/IP with about 5 microseconds over RDMA, which is why the network path becomes part of the model experience.
The setup is specific rather than aspirational. It covers Fedora 43, BIOS and kernel parameters, static addresses, MTU 9000, firewall trust, passwordless SSH, RDMA device exposure inside the container, and a custom librccl.so patch. It also calls out a hardware wrinkle: the Framework board exposes a physical PCIe x4 slot, so 100GbE cards require a riser or adapter. A modified slot is mentioned as a test setup, but the guide explicitly steers users toward safer risers.
HN discussion centered on the homelab boundary. Commenters liked the possibility of bridging the gap between 24GB consumer GPUs and much larger memory pools by combining two unified-memory boxes. At the same time, they questioned cost, token speed, PCIe limits, NIC heat, and whether Apple machines could eventually expose similar RDMA benefits over Thunderbolt.
The guide is not a turnkey product announcement, and that is the point. Local LLM performance is now shaped by memory layout, interconnect latency, containers, and serving orchestration as much as by the model file. For builders trying to run larger models outside cloud GPU rentals, this is a concrete map of the work still required.
Related Articles
Community discussion in LocalLLaMA pointed to a March 11, 2026 FastFlowLM and Lemonade update that brings Linux support to AMD XDNA 2 NPUs, including setup guidance for Ubuntu and Arch systems.
The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.
The HN reaction centered on the README as much as the code: a small engine that turns vLLM concepts into a guided implementation path.