r/LocalLLaMA: VoiceShelf Runs Kokoro TTS Offline on Android for EPUB Audiobooks

What surfaced on r/LocalLLaMA

A well-received r/LocalLLaMA post described VoiceShelf, an Android audiobook reader that runs the Kokoro speech model fully offline on-device. As of March 9, 2026, the thread had 90 points. The author says the app turns EPUB text into streamed narration locally instead of sending book content to a cloud service, which makes the project more interesting as a mobile inference system than as a simple TTS demo.

The post lays out a concrete pipeline: EPUB parsing, sentence and segment chunking, G2P via Misaki, Kokoro inference, and streaming playback while an audio buffer is built. On the author's Samsung Galaxy Z Fold 7 with Snapdragon 8 Elite, the app reportedly generates audio at about 2.8x real-time. That matters because real-time factor is the hard constraint for products like this. If generation falls below playback speed, the user experiences buffering rather than narration.

Why the implementation details matter

The same post also exposes the engineering costs that usually stay hidden. The APK is about 1 GB because it bundles the model and custom libraries needed to run without quality loss on Android. Current features include EPUB support, experimental PDF support, fully offline inference, screen-off narration, a sleep timer, and local library management. The author is explicitly asking testers on Snapdragon, Tensor, and Dimensity devices to measure throughput and thermal throttling over longer sessions.

That is exactly the right bottleneck to investigate. Running a speech model once on a flagship phone is no longer the interesting part. The harder question is whether the system remains usable across chipsets, battery conditions, and one-hour listening sessions. In other words, the community is moving from "can this model run locally?" to "can this model power a product people would actually keep installed?"

What the thread signals

VoiceShelf is a small but useful marker for local AI on mobile. It shows that offline neural narration on consumer phones is becoming viable enough to test with real content, real buffering constraints, and real thermals. The remaining issues are product issues rather than demo issues: install size, hardware variability, and sustained performance. That is a healthier place for the community to be, because it means on-device AI is being judged on operational reality instead of novelty alone.

r/LocalLLaMA: VoiceShelf Runs Kokoro TTS Offline on Android for EPUB Audiobooks

What surfaced on r/LocalLLaMA

Why the implementation details matter

What the thread signals

Related Articles

Google Embeds Gemini Intelligence Into Android: Cross-App Automation, AI Widgets, and Rambler

NeurIPS desk-rejection dispute turns AI detectors into the real review issue

A $1,500 LLM hacking test exposes the gap between capability, guardrails, and harnesses

Related Articles

Google Embeds Gemini Intelligence Into Android: Cross-App Automation, AI Widgets, and Rambler
AI May 15, 2026 1 min read

NeurIPS desk-rejection dispute turns AI detectors into the real review issue
AI Reddit Jun 4, 2026 1 min read

A $1,500 LLM hacking test exposes the gap between capability, guardrails, and harnesses
AI Hacker News Jun 4, 2026 1 min read