#gemma4

LLM Reddit Apr 15, 2026 2 min read

LocalLLaMA Jumps on Gemma-4 Audio Support in llama-server

The LocalLLaMA thread took off because native speech-to-text inside llama.cpp is exactly the kind of feature that removes an extra pipeline from local agent setups. The post says llama-server can now run STT with Gemma-4 E2A and E4A models, and commenters immediately started comparing the practical experience to Whisper and Voxtral.

#llama.cpp #gemma4 #speech-to-text

LLM Reddit Apr 4, 2026 2 min read

LocalLLaMA Benchmarks Gemma 4 31B at 256K Context on One RTX 5090

A `r/LocalLLaMA` benchmark claims Gemma 4 31B can run at 256K context on a single RTX 5090 using TurboQuant KV cache compression. The post is notable because it pairs performance numbers with detailed build notes, VRAM measurements, and community skepticism about long-context quality under heavy KV quantization.

#gemma4 #llama.cpp #kv-cache

106

LLM Hacker News Apr 4, 2026 2 min read

HN Focuses on a Practical Mac mini Setup for Ollama and Gemma 4

A practical HN gist lays out how to run Ollama and Gemma 4 on an Apple Silicon Mac mini, including auto-start, periodic preload, and `OLLAMA_KEEP_ALIVE=-1`. The author says `gemma4:26b` nearly exhausted 24GB unified memory, making the default 8B model a safer operational choice.

#ollama #gemma4 #mac-mini