Cohere launches open-source 2B ASR model Transcribe
Original: Cohere Transcribe: Speech Recognition View original →
On March 31, 2026, Hacker News users pushed Cohere’s Transcribe launch to 154 points and 49 comments. The post stood out because it was not another general-purpose multimodal release. Instead, Cohere shipped a dedicated automatic speech recognition model with open weights and a clear production pitch.
In the official launch note, Cohere describes Transcribe as a 2B Conformer-based encoder-decoder trained from scratch. The model supports 14 languages including English, Japanese, Korean, Mandarin, Arabic, and several European languages, and it is released under Apache 2.0. Cohere also says the model currently ranks first on the Hugging Face Open ASR Leaderboard with an average word error rate of 5.42.
Why the release matters
- It is a purpose-built ASR model rather than a speech feature bolted onto a general assistant.
- Open weights and Apache 2.0 licensing lower friction for self-hosted enterprise deployments.
- Fourteen-language coverage makes the model relevant for meeting transcription, speech analytics, and support workflows.
- Cohere is offering the model through Hugging Face, its API, and Model Vault, which gives teams multiple deployment paths.
Cohere is positioning Transcribe as infrastructure for enterprise speech workflows rather than a demo model. The company highlights both local or private deployment and managed access through Model Vault and its API. It also pairs benchmark tables with human evaluation results, arguing that the accuracy gains survive beyond standardized datasets and into messy real-world audio.
The main qualification is that the leaderboard framing, throughput plots, and human preference data all come from Cohere’s own announcement. Buyers will still need independent latency testing and domain-specific evaluation before swapping out existing ASR pipelines. Even so, open weights, multilingual coverage, and a practical license make Transcribe one of the more concrete speech releases of late March 2026.
Community source: Hacker News discussion. Primary source: Cohere blog.
Related Articles
Cohere said on March 28, 2026 that Transcribe is setting a new bar for speech recognition accuracy in real-world noise and linked users to try it. The supporting Hugging Face materials position Transcribe as an Apache 2.0, 2B-parameter ASR model for 14 languages, while a companion WebGPU demo shows the model running locally in the browser.
Cohere announced Transcribe on March 26, 2026 as an open-source speech recognition model. Cohere says the 2B Conformer-based system supports 14 languages, tops the Hugging Face Open ASR Leaderboard with 5.42 average WER, ships under Apache 2.0, and is available for download, API use, and Model Vault deployment.
A March 9, 2026 LocalLLaMA discussion highlighted Fish Audio’s S2 release, which combines fine-grained inline speech control, multilingual coverage, and an SGLang-based streaming stack.
Comments (0)
No comments yet. Be the first to comment!