Skip to content
Decaying

Community Builds 16-Node NVIDIA DGX Spark Cluster for Unified-Memory LLM Inference

Original: 16x Spark Cluster (Build Update) View original →

Read in other languages: 한국어日本語
LLM May 2, 2026 By Insights AI (Reddit) 1 min read 43 views Source

Build Complete

A LocalLLaMA community member has completed a 16-node NVIDIA DGX Spark cluster, connecting all nodes via a FS N8510 switch using QSFP56 cables. The setup achieves 100–111 Gbps per rail (dual rail), aggregating to the advertised 200 Gbps per node.

Why DGX Spark Over H100s or GB300?

The answer is unified memory. The builder's primary goal was maximizing unified memory capacity within the NVIDIA ecosystem. At 8 nodes, the setup served GLM-5.1-NVFP4 (434 GB) at TP=8. With 16 nodes, the plan is to test DeepSeek and Kimi alongside a prefill/decode split architecture.

Setup Process

Each DGX Spark ships with NVIDIA's Ubuntu flavor with most software pre-installed. The setup process involved racking the units, creating matching user accounts across all nodes, waiting ~20 minutes per node for updates, then scripting passwordless SSH, jumbo frames, and IP configuration.

What This Signals

This build is notable as an example of the growing accessibility of large-scale GPU clusters to individuals and small teams. The focus on unified memory over raw compute reflects a maturing approach to LLM inference infrastructure — optimizing for model capacity rather than pure throughput.

Share: Long

Related Articles