#qwen

LLM Reddit Mar 15, 2026 2 min read

r/LocalLLaMA: Qwen 3.5 27B Hits ~2000 TPS in a Document-Classification Setup

A r/LocalLLaMA field report showed how a very specific local inference workload was tuned for throughput. The author reported about 2,000 tokens per second while classifying markdown documents with Qwen 3.5 27B, and the comment thread turned the post into a practical optimization discussion.

#qwen #localllm #llama-cpp

124

LLM Reddit Mar 14, 2026 3 min read

LocalLLaMA Highlights a 14B Ada Coding Model Tuned for Safety-Critical Software Workflows

A LocalLLaMA post claims a QLoRA-tuned 14B Qwen coder model can beat frontier proprietary models on Ada compilation tasks, reviving interest in domain-specific coding models for niche but high-stakes languages.

#ada #code-generation #fine-tuning

LLM Reddit Mar 12, 2026 2 min read

Reddit Flags a New llama.cpp Metal Speedup for Qwen 3.5 on Mac

A r/LocalLLaMA post pointed Mac users to llama.cpp pull request #20361, merged on March 11, 2026, adding a fused GDN recurrent Metal kernel. The PR shows around 12-36% throughput gains on Qwen 3.5 variants, while Reddit commenters noted the change is merged but can still trail MLX on some local benchmarks.

#llama.cpp #qwen #apple-silicon

123

LLM Reddit Mar 10, 2026 2 min read

r/LocalLLaMA Tests Qwen 3.5 9B as a Real Local Agent on an M1 Pro

A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.

#qwen #local-llm #ollama

115

LLM Reddit Mar 8, 2026 2 min read

LocalLLaMA shares a llama.cpp tuning tip: smaller n_ubatch unlocked much faster Qwen 27B prompt processing

A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.

#llama.cpp #qwen #rocm

130

LLM Reddit Mar 8, 2026 2 min read

LocalLLaMA flags a merged llama.cpp update for Qwen-family inference

A r/LocalLLaMA thread is drawing attention to `llama.cpp` pull request #19504, which adds a `GATED_DELTA_NET` op for Qwen3Next-style models. Reddit users reported better token-generation speed after updating, while the PR itself includes early CPU/CUDA benchmark data.

#llama.cpp #qwen #qwen-next

113

LLM Hacker News Mar 8, 2026 2 min read

Qwen 3.5 local guide maps out memory budgets, 256K context, and llama.cpp setup

A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.

#qwen #llama.cpp #local-llm

110

LLM Reddit Mar 8, 2026 1 min read

Open WebUI’s Open Terminal gives local models a real execution environment

A high-scoring LocalLLaMA post highlights Open WebUI’s Open Terminal: a Docker or bare-metal execution layer that lets local models run commands, edit files, and return artifacts through chat.

#open-webui #tool-calling #qwen

105

LLM Hacker News Mar 5, 2026 2 min read

Qwen 3.5 Momentum Meets Team Upheaval at Alibaba

A high-ranking Hacker News thread highlighted a two-sided Qwen story: rapid model quality gains and potential organizational instability. As Qwen 3.5 expands across model sizes, reported leadership departures raise questions about roadmap continuity in the open-weight LLM ecosystem.

#qwen #open-weights #llm

LLM Reddit Mar 4, 2026 1 min read

r/LocalLLaMA benchmark compares Qwen3.5-27B Q4 quants using KLD and size tradeoffs

A high-scoring LocalLLaMA post benchmarked Qwen3.5-27B Q4 GGUF variants against BF16, separating “closest-to-baseline” choices from “best efficiency” picks for constrained VRAM setups.

#qwen #quantization #gguf

128

LLM Hacker News Mar 4, 2026 1 min read

Unsloth publishes a practical Qwen3.5 fine-tuning guide with concrete VRAM targets

A high-signal Hacker News thread surfaced Unsloth’s Qwen3.5 guide, which maps model sizes to bf16 LoRA VRAM budgets and clarifies MoE, vision, and export paths for production workflows.

#qwen #fine-tuning #unsloth

129

LLM Reddit Mar 4, 2026 2 min read

LocalLLaMA Experiment Claims Qwen3.5-35B-A3B Reaches 37.8% on SWE-bench Verified Hard

A LocalLLaMA post reports that a simple “verify after every edit” loop raised Qwen3.5-35B-A3B from 22.2% to 37.8% on SWE-bench Verified Hard, approaching a cited 40% reference for Claude Opus 4.6.

#swe-bench #coding-agents #qwen

121