Reddit is into a headless Gemma 4 server built from a Xiaomi phone, not another 48 GB rig

r/LocalLLaMA loved this because it flips the usual local-LLM flex on its head. Instead of another tower full of GPUs, the post shows a Xiaomi 12 Pro turned into a 24/7 headless Gemma 4 node. With 929 upvotes and 235 comments on the Reddit thread, the reaction was basically: yes, this is the kind of practical weird build people actually want to see.

The author says they flashed LineageOS, stripped away the Android UI and background bloat, and left roughly 9GB of RAM available for LLM work. The phone runs headless, keeps networking alive with a manually compiled wpa_supplicant, and uses a custom daemon to monitor CPU temperature and trigger an external active-cooling module through a Wi-Fi smart plug at 45°C. To avoid cooking the battery during 24/7 use, a power-delivery script cuts charging at 80%. The current setup serves Gemma 4 through Ollama as a LAN-accessible API.

The comments explain why the post hit so hard. One technically minded reply immediately suggested compiling llama.cpp directly on the device and dropping Ollama to squeeze out more inference speed. Another highly upvoted response said they were tired of seeing 48GB and 96GB build showcases and wanted good models running on normal consumer hardware instead. That is the real community angle here: this is not benchmark theater, it is an existence proof that local AI experiments do not have to start with workstation-class gear.

A phone like this is not replacing a serious GPU server, and the thread does not pretend otherwise. The appeal is different. A repurposed handset can become a quiet always-on endpoint for lightweight assistants, home-lab APIs, and personal local inference experiments. For a community obsessed with turning what it already owns into something useful, this Xiaomi build landed exactly where it should.

Reddit is into a headless Gemma 4 server built from a Xiaomi phone, not another 48 GB rig

Related Articles

Ollama’s MLX Preview Pushes Local LLM Performance on Apple Silicon

Ollama brings NVIDIA’s Nemotron-Cascade-2 into local and agent workflows

r/LocalLLaMA Tests Qwen 3.5 9B as a Real Local Agent on an M1 Pro

Related Articles

Ollama’s MLX Preview Pushes Local LLM Performance on Apple Silicon
LLM Hacker News Mar 31, 2026 1 min read

Ollama brings NVIDIA’s Nemotron-Cascade-2 into local and agent workflows
LLM X/Twitter Mar 21, 2026 2 min read

r/LocalLLaMA Tests Qwen 3.5 9B as a Real Local Agent on an M1 Pro
LLM Reddit Mar 10, 2026 2 min read