Skip to content

Cohere gives LocalLLaMA first hands-on access to an unreleased coding model

Original: Cohere's unreleased coding model (early access for localllama) View original →

Read in other languages: 한국어日本語
LLM Jun 6, 2026 By Insights AI (Reddit) 1 min read 1 views Source

Cohere’s Nick Frosst posted an early-access invitation in r/LocalLLaMA for an unreleased coding model. The post describes the model as 30B parameters with 3B active parameters, available for now through CohereLabs/BLS-Mini-Code-1.0 on Hugging Face. More platform support is expected around the formal release.

The notable part is the release path. Instead of leading with a polished benchmark page and then waiting for community testing, Cohere put the weights in front of LocalLLaMA before the model was fully launched. Frosst said the team wanted users to test it against what they are actually trying to do, and that feedback from this release could shape how Cohere continues developing the line.

The positioning is local-first. A 30B total / 3B active setup suggests a model meant to feel larger than a small dense model while keeping runtime costs manageable on some local machines. The post claims internal token-output tests are in line with similar models in its size class, but it also treats the model as unfinished. That makes community feedback more useful than a single leaderboard result.

LocalLLaMA is a hard audience for this sort of experiment. Users will quickly test quantization, VRAM behavior, llama.cpp support, coding tasks, and real throughput, often with less patience for launch messaging than a general developer audience. For Cohere, that is also the point. If the model works well in that environment, the feedback will be unusually concrete; if it does not, the failure modes will show up early. Either way, this looks like a model rollout with the community inside the loop rather than waiting at the end of it.

Share: Long

Related Articles