Covenant-72B puts permissionless distributed GPU training ahead of raw hype

Covenant-72B drew attention on r/LocalLLaMA because of how it was trained, not because the thread claimed a clean benchmark sweep. The post reached 92 points and 25 comments and framed the release as the largest model so far to be trained on decentralized permissionless GPU nodes. According to the Hugging Face model card, Covenant-72B is a 72B-parameter language model trained from scratch on 1.1 trillion English tokens. The same card describes it as the largest permissionless collaboratively trained language model released so far.

The engineering claim that matters most is the participation model. The model card says 20+ globally distributed participants coordinated through decentralized infrastructure on the Bittensor blockchain. The technical report abstract adds important context: earlier globally distributed training efforts were either smaller or relied on whitelisted participation. Covenant-72B instead targeted fully permissionless participation and dynamic participation over the internet. In practical terms, that makes this project interesting as a systems milestone. It suggests that large-scale pre-training does not necessarily require a closed consortium with tightly controlled membership if the training stack is built to tolerate unstable connectivity and changing contributors.

The published architecture details are straightforward and worth separating from the broader narrative. Covenant-72B uses 80 layers, 64 attention heads with 8 KV heads, and a hidden size of 8192, and it is released under the Apache 2.0 license. The release is also explicitly a base model, with a separate instruction-tuned variant named Covenant-72B-Chat. That distinction mattered in the Reddit discussion. One commenter viewed the Apache 2.0 license and base-model positioning positively, which is consistent with how open-model users often evaluate reuse potential. Another commenter argued that the raw performance was not state of the art. Taken together, the thread reads less like consensus around a frontier model and more like a debate over what kind of progress should count most.

The training method is central to that debate. The Reddit post highlighted SparseLoCo, described as building on DiLoCo while cutting synchronization frequency. The write-up specifically called out local AdamW, top-k sparsification, and 2-bit quantization as tools for reducing communication cost. That matters because globally distributed training over the public internet is usually constrained more by communication than by arithmetic throughput. The SparseLoCo abstract says the method reaches 1-3% sparsity while outperforming full-precision DiLoCo in communication-constrained settings. That is a targeted claim about the training regime, not a blanket statement about overall model quality, and it helps explain why the project could support dynamic, non-whitelisted participation.

Benchmark discussion should stay narrow. The model card includes comparisons against INTELLECT-1, Psyche Consilience, LLM360 K2, and LLaMA-2-70B, but the source notes here do not justify declaring Covenant-72B a new performance leader. A more defensible takeaway is that the release packages several meaningful signals at once: a 72B base model trained from scratch, a permissionless collaborative setup involving 20+ participants, and a communication-efficient method intended for unstable wide-area coordination. For the open LLM community, that combination may matter as much as any single benchmark table because it points to a different way of organizing large-model development.

Covenant-72B puts permissionless distributed GPU training ahead of raw hype

Related Articles

Claude Fable 5 puts Mythos-class AI behind cautious fallbacks

DiffusionGemma cuts the token bottleneck with a 26B open model

LocalLLaMA Spotlights MiniMax-M2.5 as Hugging Face Release Gains Traction

Related Articles

Claude Fable 5 puts Mythos-class AI behind cautious fallbacks

DiffusionGemma cuts the token bottleneck with a 26B open model

LocalLLaMA Spotlights MiniMax-M2.5 as Hugging Face Release Gains Traction
LLM Reddit Feb 16, 2026 2 min read