#qwen

LLM Reddit Apr 22, 2026 2 min read

llama.cpp --fit made LocalLLaMA rethink the VRAM wall

LocalLLaMA reacted because --fit challenged the old rule of thumb that anything outside VRAM means painfully slow inference.

#llama-cpp #local-llm #vram

LLM Apr 22, 2026 2 min read

Qwen3.6-Max-Preview pushes coding benchmarks, but stays cloud-only

Alibaba’s April 22 Qwen3.6-Max-Preview post claims top scores across six coding benchmarks and clear gains over Qwen3.6-Plus. The caveat is just as important: this is a hosted proprietary preview, not a new open-weight Qwen release.

#qwen #alibaba #coding-agents

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app

r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.

#qwen #local-llm #agents

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

r/LocalLLaMA pushed this post up because the “trust me bro” report had real operating conditions: 8-bit quantization, 64k context, OpenCode, and Android debugging.

#qwen #local-llm #coding-agents

LLM Reddit Apr 19, 2026 2 min read

LocalLLaMA’s Qwen 3.6 Thread Is Really About Configuration

LocalLLaMA reacted because the post was not just another “new model feels strong” claim. The author said Qwen 3.6 handled workloads normally reserved for Opus and Codex on an M5 Max 128GB setup, but the practical hook was the warning to enable preserve_thinking.

#qwen #local-llm #configuration

LLM Reddit Apr 19, 2026 2 min read

Local tool calling hit LocalLLaMA’s reality check: model, quant, or harness?

A r/LocalLLaMA thread turned one user’s failed local tool-calling setup into a practical checklist: OpenWebUI, native tool calls, quants, runtimes and wrappers all matter.

#local-llm #tool-calling #qwen

LLM Reddit Apr 19, 2026 1 min read

A Qwen3.6 tuning post made --n-cpu-moe the LocalLLaMA knob of the day

r/LocalLLaMA cared because the numbers were concrete: 79 t/s on an RTX 5070 Ti with 128K context, tied to one llama.cpp flag choice.

#qwen #llama-cpp #local-llm

LLM Reddit Apr 18, 2026 1 min read

Qwen3.6 excitement turned into a GGUF runtime checklist on r/LocalLLaMA

The LocalLLaMA thread cared less about a release headline and more about which Qwen3.6 GGUF quant actually works. Unsloth’s benchmark post pushed the discussion into KLD, disk size, CUDA 13.2 failures, and the messy details that decide local inference quality.

#qwen #gguf #local-llm

AI X/Twitter Apr 17, 2026 2 min read

Qwen3.6-35B-A3B opens 35B MoE weights with 3B active parameters

Why it matters: Alibaba is putting a small-active-parameter multimodal coding model into open weights rather than keeping it API-only. The tweet says Qwen3.6-35B-A3B has 35B total parameters, 3B active parameters, and an Apache 2.0 license; the blog reports 73.4 on SWE-bench Verified and 51.5 on Terminal-Bench 2.0.

#qwen #open-weights #moe

LLM Hacker News Apr 17, 2026 2 min read

Qwen3.6 pelican test turned HN into a benchmark argument

HN upvoted the joke because it exposed a real discomfort: one vivid SVG prompt can make a small local model look better than a flagship model, but nobody agrees what that proves.

#qwen #claude #local-llms

LLM Reddit Apr 16, 2026 1 min read

LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes

LocalLLaMA upvoted this because it turns a messy GGUF choice into a measurable tradeoff. The post compares community Qwen3.5-9B quants against a BF16 baseline using mean KLD, then the comments push for better visual encoding, Gemma 4 runs, Thireus quants, and long-context testing.

#qwen #gguf #quantization

LLM Hacker News Apr 16, 2026 1 min read

HN Sees Qwen3.6-35B-A3B as a Small Active-Parameter Bet for Coding Agents

HN latched onto the open-weight angle: a 35B MoE model with only 3B active parameters is interesting if it can actually carry coding-agent work. Qwen says Qwen3.6-35B-A3B improves sharply over Qwen3.5-35B-A3B, while commenters immediately moved to GGUF builds, Mac memory limits, and whether open-model-only benchmark tables are enough context.

#qwen #open-weights #coding-agents