#qwen

LLM Reddit Jun 2, 2026 2 min read

Qwen3.6-27B Looks Viable for Local Agent Planning, Not Ungated Execution

The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.

#qwen #local-ai #agents

LLM Reddit May 22, 2026 1 min read

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting

A viral LocalLLaMA post describes how Qwen3.6 35B A3B transformed complex workflows by combining Codex for task execution with skill documentation, feeding those skills to the pi agent — automating VPS management, PDF conversion, and more.

#qwen #local-llm #workflow

LLM Reddit May 22, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

A community user achieved 110 tokens/second running Qwen3.6 35B A3B on an RTX 4070 Super 12GB via ik_llama.cpp, a fork with superior CPU offload optimization that significantly outperforms upstream llama.cpp's Multi-Token Prediction implementation.

#llama-cpp #qwen #local-llm

LLM Hacker News May 20, 2026 1 min read

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings

Alibaba's Qwen team has released Qwen3.7-Max, an agent-focused frontier LLM. It ranks 5th on Artificial Analysis's Intelligence Index, nearly matching GPT 5.4, and is available as both an API and open weights.

#qwen #alibaba #llm

LLM Reddit May 10, 2026 1 min read

Running Qwen3.6 35B A3B at 80+ tok/sec on 12GB VRAM With llama.cpp MTP

A LocalLLaMA user shares their config for running Qwen3.6 35B A3B at over 80 tok/sec with 128K context on a 12GB VRAM GPU, using llama.cpp's Multi-Token Prediction support and achieving 80%+ draft acceptance rate.

#local-llm #qwen #llama-cpp

LLM Reddit May 6, 2026 1 min read

Qwen 3.6 27B Achieves 2.5x Faster Local Inference via MTP With 262k Context on 48GB

A LocalLLaMA user has shared a detailed guide for running Qwen 3.6 27B with Multi-Token Prediction support in llama.cpp, achieving 2.5x inference speedup and 262k context on 48GB of memory.

#qwen #mtp #local-llm

LLM Reddit May 3, 2026 1 min read

95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search

A local LLM researcher achieved 95.7% on SimpleQA using Qwen3.6-27B with agentic search on a single consumer GPU.

#qwen #local-llm #rtx-3090

LLM Reddit May 1, 2026 2 min read

A Pac-Man prompt pushed LocalLLaMA to argue about something bigger than tokens per second

LocalLLaMA treated this less as a speed chart and more as a question about completion quality under a messy real prompt. On the same MacBook Pro M5 Max, Qwen 3.6 27B wrote more and faster, but Gemma 4 31B finished the game logic with far fewer tokens.

#qwen #gemma #local-llm

LLM Reddit May 1, 2026 2 min read

LocalLLaMA cared less about peak speed than a 3090 setup that finally stopped crashing at 218K context

LocalLLaMA cared less about headline speed than a Qwen3.6 setup on one RTX 3090 that reached 218K context and stopped crashing on long tool outputs.

#qwen #rtx-3090 #vllm

LLM X/Twitter Apr 30, 2026 2 min read

Qwen's FlashQLA lifts linear attention speed 2-3x on Hopper

Why it matters: kernel work is what decides whether long-context and edge-side agent systems stay theoretical or become cheap enough to run. Qwen says FlashQLA delivers 2-3x forward speedup and 2x backward speedup over the FLA Triton kernel on NVIDIA Hopper.

#qwen #linear-attention #kernels

LLM Reddit Apr 30, 2026 2 min read

LocalLLaMA Fixates on a Qwen3.6 27B Setup That Pushes 204k Context on Two 16GB GPUs

LocalLLaMA reacted to this post because it brought hard numbers, not vendor marketing: a dual RTX 5060 Ti 16GB setup pushing Qwen3.6 27B to roughly 60 tok/s with a 204k context window.

#qwen #local-llm #vllm

LLM Reddit Apr 29, 2026 2 min read

LocalLLaMA liked the FlashQLA jokes, but the real hook was the numbers

The top comment went straight to the CP joke, but the post held because the technical claim was concrete: 2-3x forward speedups and 2x backward speedups for GDN chunked prefill, aimed at long-context and edge-side agentic inference.

#qwen #flashqla #linear-attention