#onnx

LLM Reddit Mar 26, 2026 2 min read

Why LocalLLaMA is paying attention to Liquid AI’s browser inference demo

A LocalLLaMA post claiming that Liquid AI’s LFM2-24B-A2B can run at roughly 50 tokens per second in a browser on an M4 Max reached 79 points and 11 comments. Community interest centered on sparse MoE architecture, ONNX packaging, and whether WebGPU can make the browser a credible local AI deployment target.

#liquid-ai #webgpu #onnx

AI Hacker News Mar 20, 2026 2 min read

Hacker News Spots Kitten TTS Pushing 25 MB-to-80 MB CPU-First Speech Models

A March 19, 2026 Hacker News post about Kitten TTS reached 512 points and 172 comments at crawl time. KittenML says its 15M, 40M, and 80M ONNX speech models target CPU inference with eight English voices and 24 kHz output.

#text-to-speech #edge-ai #onnx

AI Hacker News Mar 20, 2026 2 min read

Hacker News Spots a Tiny CPU-First TTS Release: Kitten TTS v0.8

Kitten TTS v0.8 drew Hacker News attention by promising ONNX-based speech synthesis in 15M to 80M models that can run locally on CPUs, while commenters stress-tested real-world usability.

#tts #onnx #edge-ai

AI Reddit Feb 18, 2026 1 min read

Reddit ML report: same INT8 ONNX model showed major accuracy drift across Snapdragon tiers

A r/MachineLearning discussion reported that one INT8 ONNX model produced large on-device accuracy variance across five Snapdragon chipsets, from 91.8% down to 71.2%, despite identical weights and export settings.

#edge-ai #quantization #snapdragon