Kitten TTS v0.8 drew Hacker News attention by promising ONNX-based speech synthesis in 15M to 80M models that can run locally on CPUs, while commenters stress-tested real-world usability.
#edge-ai
RSS FeedIBM unveiled Granite 4.0 1B Speech on March 9, 2026 as a compact multilingual speech-language model for ASR and bidirectional speech translation. The company says it improves English transcription accuracy over its predecessor while cutting model size in half and adding Japanese support.
Microsoft Research presented new tiny language model (TLM) results focused on reasoning efficiency at edge scale. The post emphasizes bitnet-based small models, 2-bit ternary weights, and reported gains of up to 8x speed with 4x lower memory in selected environments.
A Show HN post spotlighted Moonshine Voice, an open-source speech toolkit claiming strong accuracy and latency across edge and desktop devices. The project positions itself as a practical alternative to larger Whisper deployments for real-time voice apps.
Startup Taalas is taking a radical approach to AI inference: etching LLM model weights and architecture directly into a silicon chip. Their Llama 3.1 8B demo achieves 16,000 tokens per second — but the approach bets that model architectures won't change.
zclaw is an open-source personal AI assistant that fits in under 888 KB and runs on an ESP32 microcontroller. Part of the emerging Claw ecosystem, it demonstrates how far edge AI has come.
A high-upvote LocalLLaMA thread highlighted KittenTTS v0.8, with community-shared details on 80M/40M/14M model variants, Apache-2.0 licensing, and an edge-friendly focus on local CPU inference.
A widely discussed LocalLLaMA post introduces open Kitten TTS v0.8 models (80M/40M/14M), emphasizing CPU-friendly deployment and sub-25MB footprint for the smallest variant.
A r/MachineLearning discussion reported that one INT8 ONNX model produced large on-device accuracy variance across five Snapdragon chipsets, from 91.8% down to 71.2%, despite identical weights and export settings.