Articles

All AI LLM Humanoid Robots Sciences Gaming Finance

Source:

From To

LLM Hacker News 4d ago 2 min read

Gemma 4 26B runs at 5 tok/s on a 13-year-old Xeon

The HN debate was not just “old hardware still works.” A patched ik_llama.cpp path got Gemma 4 26B-A4B running CPU-only on dual Ivy Bridge Xeons, raising practical questions about local inference cost, control, and fallback capacity.

#gemma #cpu-inference #llama-cpp

LLM Hacker News Jun 16, 2026 2 min read

Local models are crossing from hobby setup into coding workflow

HN focused less on whether local LLMs fully replace frontier models and more on where they already make sense. The thread turned into a practical debate about Gemma, Qwen, agentic coding, memory limits, cost, and privacy.

#local-llm #agentic-coding #gemma

AI Jun 16, 2026 1 min read

A VLM in orbit starts shrinking the satellite data bottleneck

Earth-observation satellites may not need to downlink everything before analysis. A YAM-9 demonstration used Google DeepMind’s Gemma 3 and NASA JPL software to identify targets from natural-language queries while already in orbit.

#space-ai #vlm #gemma

LLM X/Twitter Jun 16, 2026 1 min read

OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B

OpenRouter added free capacity for gpt-oss-20b and Gemma 4 26B, served by Darkbloom. The move gives developers a low-cost test path for a 21B open-weight model and a 256K-context multimodal Gemma model.

#openrouter #gpt-oss #gemma

LLM Jun 12, 2026 2 min read

DiffusionGemma cuts the token bottleneck with a 26B open model

Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.

#google #deepmind #gemma

LLM X/Twitter Jun 7, 2026 2 min read

Gemma 4 QAT Cuts Edge Model Memory Down to 1GB Target

Google released Gemma 4 QAT checkpoints for edge devices and consumer GPUs. The mobile format cuts Gemma 4 E2B to a 1GB memory footprint while adding Q4_0 and ecosystem-ready weights.

#google #gemma #qat

LLM Hacker News Jun 4, 2026 1 min read

Gemma 4 12B puts the spotlight on encoder-free multimodal local AI

The thread’s energy centered on the architecture claim: what does “encoder-free” really mean for a 12B multimodal model?

#gemma #multimodal #open-weights

LLM X/Twitter Jun 4, 2026 1 min read

Gemma 4 12B removes separate encoders for laptop-scale multimodal AI

Local multimodal AI is moving into the 12B class. Google Gemma introduced Gemma 4 12B under Apache 2.0, describing a unified encoder-free design for image, audio, and text inputs.

#gemma #google #open-models

LLM Hacker News Jun 2, 2026 2 min read

A 2016 Xeon Runs Gemma 4, but the Real Story Is Memory Bandwidth

The popular thread turned a local-inference stunt into a practical discussion about decoding bottlenecks, power cost, and runtime knobs.

#local-ai #gemma #cpu-inference

LLM Reddit May 6, 2026 1 min read

Google Releases Multi-Token Prediction Drafters for Gemma 4: Up to 3x Speedup

Google has released Multi-Token Prediction (MTP) draft models for the Gemma 4 family, achieving up to 3x inference speedup through speculative decoding without any loss in output quality.

#gemma #google #mtp

LLM Reddit May 1, 2026 2 min read

A Pac-Man prompt pushed LocalLLaMA to argue about something bigger than tokens per second

LocalLLaMA treated this less as a speed chart and more as a question about completion quality under a messy real prompt. On the same MacBook Pro M5 Max, Qwen 3.6 27B wrote more and faster, but Gemma 4 31B finished the game logic with far fewer tokens.

#qwen #gemma #local-llm

AI X/Twitter Apr 25, 2026 1 min read

DeepMind trains a 12B model across four regions 20x faster

Google DeepMind’s new training stack matters because datacenter boundaries are turning into frontier bottlenecks. Decoupled DiLoCo trained a 12B Gemma model across four U.S. regions on 2-5 Gbps links, more than 20x faster than conventional synchronization while holding 64.1% average accuracy versus a 64.4% baseline.

#google-deepmind #gemma #distributed-training