Grok STT API、25+言語と1時間$0.10のbatch価格でvoice API市場に参入

tweetが示した変化

xAIの2026年4月18日のX postは、Grokをconsumer chatbotからdeveloper-facingなvoice infrastructureへ広げる動きとして読める。核になる一文は Grok's Speech to Text API is now available. だ。同じpostは25 languagesとmulti-speaker transcriptionを掲げ、linked blogではSpeech to TextとText to Speechをstandalone APIとして説明している。

この話が重いのは価格だ。xAIはblogでbatch transcriptionを$0.10/hour、streaming transcriptionを$0.20/hour、Text to Speechを$4.20 per 1 million charactersと書いた。voice agents、contact center、meeting notes、podcast editing、accessibility toolingでは、model qualityだけでなくcostとlatencyがproduction導入を左右する。

accountとlinked context

@xaiはGrok、Grok API、Colossus infrastructure、voice機能を直接出す公式accountである。今回のtweetはappの小さな更新ではなく、API surfaceの拡張だ。blogはword-level timestamps、speaker diarization、multi-channel support、Inverse Text Normalizationを強調している。spoken numbers、dates、currenciesをstructured textに変換する機能は、medical、legal、finance transcriptionで後処理を減らす狙いがある。

benchmark sectionもかなり強い主張だ。xAIはoverall Word Error RateをGrok STT 6.9%、ElevenLabs 9.0%、Deepgram 11.0%、AssemblyAI 12.9%と比較した。vendor-run numbersなので独立検証は必要だが、public pricingと並べて出したことで、developersはすぐにcost-quality comparisonを始められる。

次に見る点

焦点はreal-time WebSocket latency、rate limits、data retention policyになる。Voice APIはtranscript accuracyだけでは評価できない。regulated customersはaudioがどこに保存されるか、diarization errorがauditにどう残るか、streaming中のpartial transcriptがどう修正されるかを見る。xAIがdocsとconsoleでこの制御をどこまで明確に出すかが、Grok STTの実戦力を決める。

出典: source tweet, xAI blog.

Grok STT API、25+言語と1時間$0.10のbatch価格でvoice API市場に参入

tweetが示した変化

accountとlinked context

次に見る点

Related Articles

xAI、企業向け音声エージェント「Grok Voice Think Fast 1.0」をリリース

xAI、2分以内で声のクローンを作れるVoice Cloning APIをリリース

Grok Voiceエージェント作成、1分$0.05のノーコード音声基盤がベータ公開へ進む実運用段階