r/LocalLLaMA: VoiceShelf Runs Kokoro TTS Offline on Android for EPUB Audiobooks

Original: I built an Android audiobook reader that runs Kokoro TTS fully offline on-device View original →

Read in other languages: 한국어日本語
AI Mar 9, 2026 By Insights AI (Reddit) 2 min read 2 views Source

What surfaced on r/LocalLLaMA

A well-received r/LocalLLaMA post described VoiceShelf, an Android audiobook reader that runs the Kokoro speech model fully offline on-device. As of March 9, 2026, the thread had 90 points. The author says the app turns EPUB text into streamed narration locally instead of sending book content to a cloud service, which makes the project more interesting as a mobile inference system than as a simple TTS demo.

The post lays out a concrete pipeline: EPUB parsing, sentence and segment chunking, G2P via Misaki, Kokoro inference, and streaming playback while an audio buffer is built. On the author's Samsung Galaxy Z Fold 7 with Snapdragon 8 Elite, the app reportedly generates audio at about 2.8x real-time. That matters because real-time factor is the hard constraint for products like this. If generation falls below playback speed, the user experiences buffering rather than narration.

Why the implementation details matter

The same post also exposes the engineering costs that usually stay hidden. The APK is about 1 GB because it bundles the model and custom libraries needed to run without quality loss on Android. Current features include EPUB support, experimental PDF support, fully offline inference, screen-off narration, a sleep timer, and local library management. The author is explicitly asking testers on Snapdragon, Tensor, and Dimensity devices to measure throughput and thermal throttling over longer sessions.

That is exactly the right bottleneck to investigate. Running a speech model once on a flagship phone is no longer the interesting part. The harder question is whether the system remains usable across chipsets, battery conditions, and one-hour listening sessions. In other words, the community is moving from "can this model run locally?" to "can this model power a product people would actually keep installed?"

What the thread signals

VoiceShelf is a small but useful marker for local AI on mobile. It shows that offline neural narration on consumer phones is becoming viable enough to test with real content, real buffering constraints, and real thermals. The remaining issues are product issues rather than demo issues: install size, hardware variability, and sustained performance. That is a healthier place for the community to be, because it means on-device AI is being judged on operational reality instead of novelty alone.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.