Articles

All AI LLM Humanoid Robots Sciences Gaming Finance

Source:

From To

LLM Reddit 20h ago 2 min read

Qwen3.6-27B Looks Viable for Local Agent Planning, Not Ungated Execution

The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.

#qwen #local-ai #agents

LLM Reddit May 22, 2026 1 min read

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting

A viral LocalLLaMA post describes how Qwen3.6 35B A3B transformed complex workflows by combining Codex for task execution with skill documentation, feeding those skills to the pi agent — automating VPS management, PDF conversion, and more.

#qwen #local-llm #workflow

LLM Reddit May 22, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

A community user achieved 110 tokens/second running Qwen3.6 35B A3B on an RTX 4070 Super 12GB via ik_llama.cpp, a fork with superior CPU offload optimization that significantly outperforms upstream llama.cpp's Multi-Token Prediction implementation.

#llama-cpp #qwen #local-llm

LLM Hacker News May 20, 2026 1 min read

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings

Alibaba's Qwen team has released Qwen3.7-Max, an agent-focused frontier LLM. It ranks 5th on Artificial Analysis's Intelligence Index, nearly matching GPT 5.4, and is available as both an API and open weights.

#qwen #alibaba #llm

LLM Reddit May 10, 2026 1 min read

Running Qwen3.6 35B A3B at 80+ tok/sec on 12GB VRAM With llama.cpp MTP

A LocalLLaMA user shares their config for running Qwen3.6 35B A3B at over 80 tok/sec with 128K context on a 12GB VRAM GPU, using llama.cpp's Multi-Token Prediction support and achieving 80%+ draft acceptance rate.

#local-llm #qwen #llama-cpp

LLM Reddit May 6, 2026 1 min read

Qwen 3.6 27B Achieves 2.5x Faster Local Inference via MTP With 262k Context on 48GB

A LocalLLaMA user has shared a detailed guide for running Qwen 3.6 27B with Multi-Token Prediction support in llama.cpp, achieving 2.5x inference speedup and 262k context on 48GB of memory.

#qwen #mtp #local-llm

LLM Reddit May 3, 2026 1 min read

95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search

A local LLM researcher achieved 95.7% on SimpleQA using Qwen3.6-27B with agentic search on a single consumer GPU.

#qwen #local-llm #rtx-3090

LLM Reddit May 3, 2026 1 min read

95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search

A local LLM researcher achieved 95.7% on SimpleQA using Qwen3.6-27B with agentic search on a single consumer GPU.

#qwen #local-llm #rtx-3090

LLM Reddit May 1, 2026 2 min read

A Pac-Man prompt pushed LocalLLaMA to argue about something bigger than tokens per second

LocalLLaMA treated this less as a speed chart and more as a question about completion quality under a messy real prompt. On the same MacBook Pro M5 Max, Qwen 3.6 27B wrote more and faster, but Gemma 4 31B finished the game logic with far fewer tokens.

#qwen #gemma #local-llm

LLM Reddit May 1, 2026 2 min read

LocalLLaMA cared less about peak speed than a 3090 setup that finally stopped crashing at 218K context

LocalLLaMA cared less about headline speed than a Qwen3.6 setup on one RTX 3090 that reached 218K context and stopped crashing on long tool outputs.

#qwen #rtx-3090 #vllm

LLM X/Twitter Apr 30, 2026 2 min read

Qwen's FlashQLA lifts linear attention speed 2-3x on Hopper

Why it matters: kernel work is what decides whether long-context and edge-side agent systems stay theoretical or become cheap enough to run. Qwen says FlashQLA delivers 2-3x forward speedup and 2x backward speedup over the FLA Triton kernel on NVIDIA Hopper.

#qwen #linear-attention #kernels

LLM Reddit Apr 30, 2026 2 min read

LocalLLaMA Fixates on a Qwen3.6 27B Setup That Pushes 204k Context on Two 16GB GPUs

LocalLLaMA reacted to this post because it brought hard numbers, not vendor marketing: a dual RTX 5060 Ti 16GB setup pushing Qwen3.6 27B to roughly 60 tok/s with a 204k context window.

#qwen #local-llm #vllm