Ternary Bonsai、1.58-bit open 8B model familyを1.75GBまで縮めた

PrismMLの4月16日のX postは、open-model buildersに具体的なefficiency claimを示した点でmaterialだ。source tweetはTernary Bonsaiが "ternary weights {-1, 0, +1}" を使うと書き、1.58-bit language modelsのfamilyとして位置づけた。作成時刻は2026-04-16 17:39:18 UTCで、指定された48時間window内にある。 source tweetも併記する。

重要なのは数字だ。PrismMLは、modelsが16-bit counterpartsより9x小さく、Apache 2.0 licenseで8B 1.75GB、4B 0.86GB、1.7B 0.37GBの3 sizesとして出ると書いた。publicなHugging Face collectionにはTernary Bonsai collection、MLX model entries、demo collectionがあり、4月16日のupdatesが確認できる。community repliesではONNX、MLX、browser WebGPU demosも触れられているが、次に読むべきなのはmodel cardsとbenchmark detailsだ。

technical hookはternary weight formatだ。各weightをhigher-precision floating-point valueとして保存するのではなく、model familyはweightsを三つの値に制限し、trainingとkernelsでusable qualityを保とうとする。そのためsize numberはかなり攻めて見え、deployment supportはheadline benchmark imageと同じくらい重要になる。Hugging Face collectionのMLX entriesはApple Siliconをintended local pathの一つとして示す。browserとWebGPU demosが安定すればclient-side agentsにも意味が出る。independent perplexity、coding、instruction-following testsがcompressionの実用性を決める。

PrismMLはAI efficiencyを中心に置くresearch groupとして見られている。今回のpostも、local inferenceとlow-memory inferenceをより現実的にする流れに沿う。次に見るべきはreplicationである。benchmark imageとmodel cardsの結果がindependent testsでも維持されるなら、1.58-bit familyはbrowser demos、phones、private local agentsで意味を持つ。そうでなくても、extreme quantizationでreasoning qualityがどこまで残るかを測るstress testになる。

Ternary Bonsai、1.58-bit open 8B model familyを1.75GBまで縮めた

Related Articles

Gemma 4 12B、別エンコーダなしでノートPC級マルチモーダル推論へApache 2.0で公開

r/LocalLLaMA、Transformers入りしたMistral 4を確認 119B MoE・256k context

Nemotron 3 Ultra、550B MoEでエージェント推論5倍と30%コスト削減を提示