A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.
#qwen
RSS FeedA LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.
A r/LocalLLaMA thread is drawing attention to `llama.cpp` pull request #19504, which adds a `GATED_DELTA_NET` op for Qwen3Next-style models. Reddit users reported better token-generation speed after updating, while the PR itself includes early CPU/CUDA benchmark data.
A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.
A high-scoring LocalLLaMA post highlights Open WebUI’s Open Terminal: a Docker or bare-metal execution layer that lets local models run commands, edit files, and return artifacts through chat.
A high-ranking Hacker News thread highlighted a two-sided Qwen story: rapid model quality gains and potential organizational instability. As Qwen 3.5 expands across model sizes, reported leadership departures raise questions about roadmap continuity in the open-weight LLM ecosystem.
A high-scoring LocalLLaMA post benchmarked Qwen3.5-27B Q4 GGUF variants against BF16, separating “closest-to-baseline” choices from “best efficiency” picks for constrained VRAM setups.
A high-signal Hacker News thread surfaced Unsloth’s Qwen3.5 guide, which maps model sizes to bf16 LoRA VRAM budgets and clarifies MoE, vision, and export paths for production workflows.
A LocalLLaMA post reports that a simple “verify after every edit” loop raised Qwen3.5-35B-A3B from 22.2% to 37.8% on SWE-bench Verified Hard, approaching a cited 40% reference for Claude Opus 4.6.
A demo running Qwen 3.5 0.8B entirely in the browser using WebGPU and Transformers.js scored 440 on r/LocalLLaMA. No server, no API key, no installation required — just a modern browser with GPU access.
A widely-shared r/LocalLLaMA comparison of Qwen's smallest models across three generations (score: 681) reveals extraordinary efficiency gains. The Qwen 3.5 9B now outperforms the previous-generation 80B on several benchmarks, while the 2B handles video understanding better than many 7B models.
Alibaba Qwen team released the Qwen 3.5 small model series (0.8B to 9B). Models run in-browser via WebGPU and show dramatic benchmark improvements over previous generations.