#rdma

LLM Hacker News Jun 28, 2026 2 min read

Two Strix Halo boards as a vLLM cluster: the hard part is RDMA

Local LLM builders are moving from “can it run?” to “can two small unified-memory boxes behave like one machine?” This guide walks through Framework Strix Halo boards, Intel E810 RoCE v2, and vLLM serving.

#amd #strix-halo #vllm

LLM Reddit Feb 26, 2026 1 min read

Reddit Spotlights DeepSeek DualPath for KV-Cache I/O Bottlenecks in Agentic LLMs

A trending r/LocalLLaMA thread highlighted the DualPath paper on KV-Cache bottlenecks in disaggregated inference systems. The arXiv abstract reports up to 1.87x offline throughput and 1.96x average online throughput gains while meeting SLO.

#llm-inference #kv-cache #rdma

102