Sakana AI's KAME Injects Real-Time LLM Knowledge Into Speech AI Without the Latency Penalty

Solving the Speed-Knowledge Tradeoff in Voice AI

Existing speech-to-speech AI systems face a fundamental tradeoff: direct S2S models respond instantly but lack deep knowledge, while cascade systems add a 2.1-second pipeline delay. Sakana AI's KAME (turtle in Japanese) addresses this directly.

The KAME Architecture

KAME extends Moshi's three-stream design (input audio, inner monologue, output audio) with a fourth "oracle stream." A front-end S2S model responds immediately to user speech while simultaneously streaming an interim transcript to a back-end LLM. The LLM's richer response flows back to the front-end through the oracle stream, injecting knowledge in real time without stalling output.

The system is fully back-end agnostic. Trained using gpt-4.1-nano, it works with claude-opus-4-1, gemini-2.5-flash, or any other LLM at inference time with no retraining required.

Performance

MT-Bench score: 6.43 (comparable to full cascade systems)
Response latency: Near-zero, matching direct S2S
Pipeline delay eliminated: No 2.1-second delay of traditional cascades

Training: Simulated Oracle Augmentation

Sakana AI used a "simulator" LLM with a standard conversational dataset to generate synthetic oracle sequences across varying levels of transcript completeness — avoiding the prohibitive cost of real-time LLM training data generation.

Source: Sakana AI, MarkTechPost

LLM 2d ago 1 min read

AlphaGo Creator David Silver Raises $1.1B Seed Round for RL-Only AI Startup Ineffable Intelligence

David Silver, the former DeepMind researcher who co-created AlphaGo and AlphaZero, has raised $1.1B at a $5.1B valuation for Ineffable Intelligence to build an AI superlearner that discovers knowledge without human-generated training data.

#research #funding #reinforcement-learning

LLM X/Twitter 5d ago 2 min read

Anthropic says LoRA audit layer spots 7 of 9 hidden tuning attacks

If models can describe the behaviors they picked up during fine-tuning, post-training audits get faster and cheaper. Anthropic says its new introspection-adapter method reached 59% on AuditBench and surfaced covert tuning attacks in 7 of 9 cipher-based models.

#anthropic #alignment #model-auditing

LLM Hacker News 6d ago 2 min read

HN fixated on the harder question behind Claude Code: who owns AI-written code?

HN did not treat this as abstract legal trivia. Once the Claude Code leak became the hook, the thread turned into a practical question for every team shipping AI-assisted software: if the model wrote the bulk of it, what is actually yours?