Cohere gives LocalLLaMA first hands-on access to an unreleased coding model

Cohere’s Nick Frosst posted an early-access invitation in r/LocalLLaMA for an unreleased coding model. The post describes the model as 30B parameters with 3B active parameters, available for now through CohereLabs/BLS-Mini-Code-1.0 on Hugging Face. More platform support is expected around the formal release.

The notable part is the release path. Instead of leading with a polished benchmark page and then waiting for community testing, Cohere put the weights in front of LocalLLaMA before the model was fully launched. Frosst said the team wanted users to test it against what they are actually trying to do, and that feedback from this release could shape how Cohere continues developing the line.

The positioning is local-first. A 30B total / 3B active setup suggests a model meant to feel larger than a small dense model while keeping runtime costs manageable on some local machines. The post claims internal token-output tests are in line with similar models in its size class, but it also treats the model as unfinished. That makes community feedback more useful than a single leaderboard result.

LocalLLaMA is a hard audience for this sort of experiment. Users will quickly test quantization, VRAM behavior, llama.cpp support, coding tasks, and real throughput, often with less patience for launch messaging than a general developer audience. For Cohere, that is also the point. If the model works well in that environment, the feedback will be unusually concrete; if it does not, the failure modes will show up early. Either way, this looks like a model rollout with the community inside the loop rather than waiting at the end of it.

Cohere gives LocalLLaMA first hands-on access to an unreleased coding model

Related Articles

r/LocalLLaMA Benchmarks: <code>Krasis</code> reports 3,324 tok/s prefill for 80B MoE on one RTX 5080

OpenPangu-2.0-Flash draws LocalLLaMA interest with 92B total, 6B active MoE

LongCat-2.0 makes the infrastructure story as important as the MoE scale

Related Articles

r/LocalLLaMA Benchmarks: <code>Krasis</code> reports 3,324 tok/s prefill for 80B MoE on one RTX 5080
LLM Reddit Mar 1, 2026 2 min read

OpenPangu-2.0-Flash draws LocalLLaMA interest with 92B total, 6B active MoE
LLM Reddit Jun 30, 2026 1 min read

LongCat-2.0 makes the infrastructure story as important as the MoE scale
LLM Hacker News Jul 2, 2026 1 min read