Ternary Bonsai squeezes 8B models to 1.75GB at 1.58 bits

PrismML's April 16 X post is material because it gives open-model builders a concrete efficiency claim. The source tweet says Ternary Bonsai uses "ternary weights {-1, 0, +1}" and frames the family as 1.58-bit language models. It was created at 2026-04-16 17:39:18 UTC, inside the requested 48-hour window. See the source tweet.

The numbers are the story. PrismML says the models are 9x smaller than their 16-bit counterparts and are released under Apache 2.0 in three sizes: 8B at 1.75GB, 4B at 0.86GB, and 1.7B at 0.37GB. The public Hugging Face collection lists the Ternary Bonsai collection, MLX model entries, and demo collection, with updates on April 16. Community replies also point to ONNX, MLX, and browser WebGPU demos, but the model cards and benchmark details are what need close reading next.

The technical hook is the ternary weight format. Instead of storing each weight as a higher-precision floating-point value, the model family restricts weights to three values and relies on training and kernels to keep quality usable. That is why the size numbers are so aggressive, and why deployment support matters as much as the headline benchmark image. The Hugging Face collection's MLX entries point to Apple Silicon as one intended local path, while browser and WebGPU demos would make the release more interesting for client-side agents. Independent perplexity, coding, and instruction-following tests will decide whether the compression is practical or mostly a research artifact.

PrismML describes itself as focused on AI efficiency, so this post fits its usual lane: making local and low-memory inference more practical. The next watch item is replication. If the benchmark image and model cards hold up across independent tests, a 1.58-bit family that stays usable at 8B, 4B, and 1.7B sizes could matter for browser demos, phones, and private local agents. If not, the release will still be a useful stress test for how much reasoning quality survives extreme quantization.

Ternary Bonsai squeezes 8B models to 1.75GB at 1.58 bits

Related Articles

Gemma 4 12B removes separate encoders for laptop-scale multimodal AI

r/LocalLLaMA Spots Mistral 4 Landing in Transformers with 119B MoE and 256k Context

DeepSeek V4-Pro makes its 75% API price cut permanent