Cohere pushes Transcribe as an open 2B ASR model with a WebGPU browser demo
Original: Cohere Transcribe is setting a new standard for automatic speech recognition model accuracy in real world conditions – even with a noisy blender running. Try it out for yourself 👇 View original →
On March 28, 2026, Cohere said on X that Cohere Transcribe was setting a new standard for automatic speech recognition in real-world conditions, including a deliberately noisy blender scenario, and invited developers to try it. The tweet was short, but it landed after a broader public release cycle around the same model on Hugging Face, which makes the post more than a casual demo clip. It reads as a distribution signal: Cohere wants Transcribe to be understood as both a research-quality ASR model and a practical developer asset.
The Hugging Face model page fills in the technical shape behind the announcement. Cohere Transcribe is described there as a dedicated 2B-parameter audio-in, text-out ASR model trained from scratch, released under Apache 2.0, and supporting 14 languages across European, East Asian, and MENA usage. The model card also notes native support in transformers and references a vLLM integration path, which positions the release as something teams can use for both offline and server-side deployment rather than only as a benchmark artifact.
A separate Hugging Face Space, Cohere Transcribe WebGPU, adds another layer to the story. The Space describes the experience as running Transcribe locally in the browser on WebGPU, and it links both the original model and an ONNX conversion. That matters because it suggests Cohere is emphasizing deployability alongside accuracy. Instead of keeping the model inside a hosted API narrative, the company is showing that developers can experiment with browser-side inference, local execution, and open-weight workflows in parallel.
For builders, that combination is the real signal. Speech recognition is often evaluated on word error rate tables, but production adoption also depends on licensing, language coverage, framework support, and how easily the model can be embedded into real applications. The X post does not publish a full methodology or benchmark sheet by itself, so teams will still need to test domain-specific audio and inspect the model materials carefully. Even so, the March 28 post is a clear marker that Cohere is expanding beyond text-centric model launches into practical speech infrastructure with open distribution and browser-native experimentation.
Sources: Cohere X post · Cohere browser demo X post · Hugging Face model page · WebGPU demo Space
Related Articles
Cohere announced Transcribe on March 26, 2026 as an open-source speech recognition model. Cohere says the 2B Conformer-based system supports 14 languages, tops the Hugging Face Open ASR Leaderboard with 5.42 average WER, ships under Apache 2.0, and is available for download, API use, and Model Vault deployment.
A LocalLLaMA post details recurring Whisper hallucinations during silence and proposes a layered mitigation stack including Silero VAD gating, prompt-history reset, and exact-string blocking.
A post on r/LocalLLaMA highlighted Kreuzberg v4.5, a Rust-based document intelligence framework that now adds stronger layout and table understanding. The release claims Docling-level quality with lower memory overhead and materially faster processing.
Comments (0)
No comments yet. Be the first to comment!