#webgpu

LLM Reddit Apr 16, 2026 1 min read

A 290MB 1-Bit LLM in the Browser Gives LocalLLaMA Both Delight and Doubt

LocalLLaMA reacted with genuine wonder because the demo is simple to grasp: a 1.7B Bonsai model, about 290MB, running in a browser through WebGPU. The same thread also did the useful reality check, asking about tokens per second, hallucinations, llama.cpp support, and whether 1-bit models are ready for anything beyond narrow tasks.

#local-llm #webgpu #quantization

LLM Hacker News Apr 6, 2026 2 min read

Hacker News Spots Gemma Gem, a Browser-Embedded Agent That Runs Gemma 4 With No Cloud

A Show HN thread highlighted Gemma Gem, a Chrome extension that runs Gemma 4 locally via WebGPU and exposes page-reading, clicking, typing, scrolling, screenshot, and JavaScript tools without API keys or server-side inference.

#llm #gemma #webgpu

AI sources.twitter Mar 28, 2026 2 min read

Cohere pushes Transcribe as an open 2B ASR model with a WebGPU browser demo

Cohere said on March 28, 2026 that Transcribe is setting a new bar for speech recognition accuracy in real-world noise and linked users to try it. The supporting Hugging Face materials position Transcribe as an Apache 2.0, 2B-parameter ASR model for 14 languages, while a companion WebGPU demo shows the model running locally in the browser.

#cohere #transcribe #speech-recognition

LLM Reddit Mar 26, 2026 2 min read

Why LocalLLaMA is paying attention to Liquid AI’s browser inference demo

A LocalLLaMA post claiming that Liquid AI’s LFM2-24B-A2B can run at roughly 50 tokens per second in a browser on an M4 Max reached 79 points and 11 comments. Community interest centered on sparse MoE architecture, ONNX packaging, and whether WebGPU can make the browser a credible local AI deployment target.

#liquid-ai #webgpu #onnx

LLM Hacker News Mar 13, 2026 2 min read

Hacker News spots CanIRun.ai, a browser-side local AI compatibility checker

CanIRun.ai runs entirely in the browser, detects GPU, CPU, and RAM through WebGL, WebGPU, and navigator APIs, and estimates which quantized models fit your machine. HN readers liked the idea but immediately pushed on missing hardware entries, calibration, and reverse-lookup features.

#local-ai #llm-inference #hardware

LLM Reddit Mar 3, 2026 1 min read

Qwen 3.5 0.8B Runs Fully In-Browser via WebGPU and Transformers.js

A demo running Qwen 3.5 0.8B entirely in the browser using WebGPU and Transformers.js scored 440 on r/LocalLLaMA. No server, no API key, no installation required — just a modern browser with GPU access.

#qwen #webgpu #local-llm