#gemma

AI sources.twitter 2d ago 1 min read

DeepMind trains a 12B model across four regions 20x faster

Google DeepMind’s new training stack matters because datacenter boundaries are turning into frontier bottlenecks. Decoupled DiLoCo trained a 12B Gemma model across four U.S. regions on 2-5 Gbps links, more than 20x faster than conventional synchronization while holding 64.1% average accuracy versus a 64.4% baseline.

#google-deepmind #gemma #distributed-training

LLM Reddit 2d ago 2 min read

LocalLLaMA Jumps on a KV-Cache Benchmark That Breaks the "q8_0 Is Basically Free" Myth

LocalLLaMA reacted because the post did not just tweak a benchmark table. It went after a widely repeated local-inference assumption and showed that the answer changes sharply by model family, especially for Gemma. By crawl time on April 25, 2026, the thread had 324 points and 58 comments.

#kv-cache #gemma #qwen

LLM Reddit 4d ago 2 min read

LocalLLaMA Turns a Gemma 4 Translation Anecdote Into a Local-Control Argument

A r/LocalLLaMA post is not a formal benchmark, but it captured the community mood: local models can be attractive when hosted models drift, filter unexpectedly, or change behavior across updates.

#localllama #gemma #local-llm

LLM Reddit Apr 15, 2026 1 min read

Reddit sees a cleaner local speech stack as audio lands in llama-server with Gemma 4

LocalLLaMA jumped on this because native audio in llama-server promises a much cleaner speech workflow for local AI. The first wave of comments loves the idea of dropping the extra Whisper service, but it is also documenting where long-form audio still breaks.

#llama-cpp #speech-to-text #gemma

LLM Reddit Apr 15, 2026 1 min read

Reddit is into a headless Gemma 4 server built from a Xiaomi phone, not another 48 GB rig

Reddit lit up around a build that turns a Xiaomi 12 Pro into a headless Gemma 4 server because it feels much closer to how most people actually tinker with local AI. The excitement was not about peak numbers, it was about proving that useful local inference can live on everyday hardware.

#local-llm #android #gemma

LLM Apr 13, 2026 2 min read

Google pushes Gemma 4 agentic workflows onto edge devices

Google's AI Edge team said on April 2, 2026 that Gemma 4 is bringing multi-step agentic workflows to phones, desktops, and edge hardware under an Apache 2.0 license. The launch combines open models, Agent Skills, and LiteRT-LM deployment tooling.

#google #gemma #on-device

LLM sources.twitter Apr 9, 2026 1 min read

Google DeepMind says Gemma 4 passed 10M downloads in its first week

On April 9, 2026, Google DeepMind said on X that Gemma 4 crossed 10M downloads in its first week and that the Gemma family overall has topped 500M downloads. Google positions Gemma 4 as an open model family built for reasoning, agentic workflows, and efficient deployment on local hardware.

#google-deepmind #gemma #open-models

AI Hacker News Apr 7, 2026 2 min read

Parlor Shows Real-Time On-Device Multimodal Voice AI on Apple Silicon

A recent Show HN thread pointed to Parlor, a local multimodal assistant that combines Gemma 4 E2B, Kokoro, browser voice activity detection, and streaming audio playback. The project reports around 2.5 to 3.0 seconds of end-to-end latency on an Apple M3 Pro.

#multimodal #on-device-ai #gemma

LLM sources.twitter Apr 6, 2026 2 min read

Google DeepMind launches Gemma 4 open models with Apache 2.0 licensing and native agent features

Google DeepMind’s April 2, 2026 X thread introduced Gemma 4 as a new open model family built for reasoning and agentic workflows. Google says the lineup spans E2B, E4B, 26B MoE, and 31B Dense, and adds native function calling, structured JSON output, and longer context windows.

#google #deepmind #gemma

LLM Reddit Apr 6, 2026 2 min read

LocalLLaMA Showcases PokeClaw, a Fully On-Device Gemma 4 Agent for Android

A LocalLLaMA post drew attention to PokeClaw, an open-source Android prototype that runs Gemma 4 locally through LiteRT-LM and lets the model tap, swipe, type, open apps, send messages, and manage auto-replies without cloud inference.

#llm #android #gemma

LLM Hacker News Apr 6, 2026 2 min read

Hacker News Spots Gemma Gem, a Browser-Embedded Agent That Runs Gemma 4 With No Cloud

A Show HN thread highlighted Gemma Gem, a Chrome extension that runs Gemma 4 locally via WebGPU and exposes page-reading, clicking, typing, scrolling, screenshot, and JavaScript tools without API keys or server-side inference.

#llm #gemma #webgpu

LLM Reddit Apr 6, 2026 2 min read

LocalLLaMA digs into Gemma 4 Per-Layer Embeddings and why the small models behave differently

A LocalLLaMA explainer argues that Gemma 4 E2B/E4B gain their efficiency from Per-Layer Embeddings. The key point is that many of those parameters behave more like large token lookup tables than always-active compute-heavy layers, which changes the inference trade-off.

#llm #gemma #inference