Covenant-72B、permissionless 分散 GPU 学習を前面に出した 72B base model

Covenant-72B は、単純な benchmark の勝ち負けよりも、どうやって学習したかという点で注目を集めた公開モデルだ。r/LocalLLaMA の投稿は 92 points と 25 comments を記録し、decentralized permissionless GPU nodes 上で学習された最大のモデルだと紹介された。Hugging Face の model card によれば、Covenant-72B は 72B parameters の language model で、1.1 trillion English tokens を使って from scratch で学習されている。同じ model card は、このモデルを largest permissionless collaboratively trained language model と位置づけている。

このリリースで本当に重要なのは参加方式だ。model card では、20+ の globally distributed participants が Bittensor blockchain 上の decentralized infrastructure を通じて協調したと説明されている。さらに technical report の abstract は、従来の globally distributed training はより小規模だったか、あるいは whitelisted participation に依存していたと対比している。Covenant-72B は fully permissionless participation と dynamic participation over the internet を前提にしており、その組み合わせが unprecedented scale でも機能しうることを示したという主張だ。つまり、このプロジェクトの意味は、また一つ 70B 級 model が出たという話ではなく、non-whitelisted pre-training が実運用に近い形で成立するかを試した点にある。

公開されている architecture details も明快だ。Covenant-72B は 80 layers、64 attention heads、8 KV heads、hidden size 8192 を採用し、Apache 2.0 license で公開されている。加えて、これは base model であり、instruction-tuned variant として Covenant-72B-Chat が別に用意されている。Reddit のコメントもそこに反応していた。ある commenter は Apache 2.0 license と base-model positioning を好意的に評価し、別の commenter は raw performance は state of the art ではないと主張した。コミュニティの受け止め方は一方向ではなく、性能そのものより、どの性質を重視するかで評価が分かれている。

その文脈で目立つのが SparseLoCo だ。Reddit 投稿では、SparseLoCo は DiLoCo を基盤にしつつ synchronization frequency を下げる方式として説明されていた。local AdamW、top-k sparsification、2-bit quantization を組み合わせ、communication cost を削減するというのが要点だ。public internet 上の globally distributed training では、計算性能より通信が先に制約になることが多い。SparseLoCo の abstract は、1-3% sparsity に達しながら、communication-constrained setting では full-precision DiLoCo を上回ったと述べている。ここで重要なのは、これは訓練方式に関する限定的な主張であり、あらゆる benchmark で superior だという意味ではない点だ。

model card には INTELLECT-1、Psyche Consilience、LLM360 K2、LLaMA-2-70B との benchmark comparison も掲載されているが、このソースノートだけで Covenant-72B を新しい性能リーダーだと断言するのは適切ではない。より妥当な読み方は、72B base model を from scratch で構築し、20+ 参加者による permissionless collaborative training を成立させ、しかも通信効率を重視した stack を示したこと自体が主要なニュースだというものだ。open LLM コミュニティにとっては、モデルの絶対スコアだけでなく、誰が参加できるか、どの license で使えるか、どの training method が再現可能かが同じくらい重要であり、Covenant-72B はその論点を前面に押し出したケースと言える。

Covenant-72B、permissionless 分散 GPU 学習を前面に出した 72B base model

Related Articles

Claude Fable 5、Mythos級AIを慎重なフォールバック付きで一般公開

Fusion API、Fable 5級の研究回答を半額水準で狙う設計

DiffusionGemma、26B open modelでtoken生成の待ち時間を圧縮