#audio

AI 1d ago 1 min read

OpenAI 새 transcription API, 파일·실시간 음성 모델을 둘로 분리

OpenAI가 2026년 7월 28일 GPT Transcribe와 GPT Live Transcribe를 GA로 공개했다. 파일 전사와 live audio 전사를 분리하고, keyword hint와 복수 language hint를 새 기본 경로로 밀어 올렸다.

#openai #audio #speech-to-text

AI Reddit May 24, 2026 1 min read

사람 귀엔 안 들리는 prompt injection, 음성 assistant의 새 공격면

Reddit의 논점은 공포보다 검증 가능성에 있었다. 마이크·스피커·압축을 통과해 명령이 얼마나 안정적으로 먹히는지가 핵심이다.

#prompt-injection #voice-assistants #security

LLM Reddit Apr 13, 2026 1 min read

r/LocalLLaMA, Qwen3 audio support가 들어간 llama.cpp merge에 주목

54점 Reddit post는 merged PR #19441을 통해 qwen3-omni-moe와 qwen3-asr support가 llama.cpp에 들어왔다고 알렸고, 댓글은 local multimodal과 ASR 실사용 기대를 드러냈다.

#qwen3 #llama-cpp #audio

AI X/Twitter Apr 5, 2026 1 min read

Mistral, 저지연 다국어 음성 에이전트를 위한 Voxtral TTS 공개

Mistral AI는 2026년 3월 26일 Voxtral TTS가 expressive speech, 9개 언어 지원, 낮은 latency, 쉬운 voice adaptation을 제공한다고 밝혔다. Mistral의 3월 23일 launch post는 4B-parameter 모델이 약 3초 reference audio로 custom voice adaptation을 수행하고, 약 70ms model latency와 최대 2분 native audio generation을 지원한다고 설명한다.

#mistral #tts #voice-agents

LLM X/Twitter Apr 3, 2026 1 min read

Mistral, Voxtral과 Mistral Small 4로 짜는 speech-to-speech assistant stack 제시

Mistral은 2026년 4월 2일 Voxtral 기반 transcription·speech generation과 Mistral Small 4 reasoning을 묶어 web-search-enabled speech-to-speech assistant를 약 150 lines의 code로 만들 수 있다고 밝혔다. 이 글은 단일 모델 공개보다 real-time audio agent를 위한 reference architecture 제시에 가깝다.

#mistral #audio #speech-to-speech

AI X/Twitter Mar 16, 2026 1 min read

xAI, Text-to-Speech API 공개… streaming·speech tags·5개 voice 제공

xAI는 2026년 3월 16일 Grok의 Text-to-Speech API가 공개됐다고 밝혔다. xAI 공식 voice 문서는 beta API가 5개 voice, inline speech tags, telephony-friendly codec, low-latency WebSocket streaming 모드를 지원한다고 설명한다.

#xai #grok #text-to-speech

AI Mar 15, 2026 2 min read

Mistral, Voxtral Realtime와 Voxtral Mini Transcribe V2로 speech stack 확장

Mistral은 Voxtral Realtime와 Voxtral Mini Transcribe V2를 공개하며 sub-200ms streaming transcription, 13개 언어 지원, realtime model의 open weights를 내놓았다. 동시에 Mistral Studio의 audio playground와 $0.003/min·$0.006/min pricing도 함께 제시했다.

#mistral #speech #transcription

AI Reddit Mar 15, 2026 1 min read

Fish Audio S2, inline 감정 제어와 빠른 스트리밍을 결합한 오픈 TTS로 주목

2026년 3월 9일 LocalLLaMA에서는 Fish Audio S2가 fine-grained inline control, multilingual 지원, SGLang 기반 streaming stack을 함께 제시한 점이 주목을 받았다.

#tts #speech #audio