Cohere has entered the speech stack race with Transcribe, a 2B Apache 2.0 ASR model for 14 languages. Open weights, Hugging Face distribution, and a claimed 5.42 average WER headline the release.
#speech-recognition
RSS FeedCohere said on March 28, 2026 that Transcribe is setting a new bar for speech recognition accuracy in real-world noise and linked users to try it. The supporting Hugging Face materials position Transcribe as an Apache 2.0, 2B-parameter ASR model for 14 languages, while a companion WebGPU demo shows the model running locally in the browser.
Cohere announced Transcribe on March 26, 2026 as an open-source speech recognition model. Cohere says the 2B Conformer-based system supports 14 languages, tops the Hugging Face Open ASR Leaderboard with 5.42 average WER, ships under Apache 2.0, and is available for download, API use, and Model Vault deployment.
A LocalLLaMA post details recurring Whisper hallucinations during silence and proposes a layered mitigation stack including Silero VAD gating, prompt-history reset, and exact-string blocking.
A Show HN post spotlighted Moonshine Voice, an open-source speech toolkit claiming strong accuracy and latency across edge and desktop devices. The project positions itself as a practical alternative to larger Whisper deployments for real-time voice apps.