#qwen

LLM Reddit Mar 23, 2026 2 min read

Qwen3.5-122B-A10B Uncensored (Aggressive) ships in GGUF with new K_P quants

A Reddit post in r/LocalLLaMA introduces a GGUF release of Qwen3.5-122B-A10B Uncensored (Aggressive) alongside new K_P quants. The author claims 0/465 refusals and zero capability loss, but those results are presented as the author’s own tests rather than independent verification.

#qwen #gguf #local-llms

LLM Reddit Mar 22, 2026 2 min read

r/LocalLLaMA Benchmarks ik_llama.cpp at 26x Faster Qwen 3.5 Prompt Ingestion

A high-signal r/LocalLLaMA benchmark post said moving Qwen 3.5 27B from mainline llama.cpp to ik_llama.cpp raised prompt evaluation from about 43 tok/sec to 1,122 tok/sec on a Blackwell RTX PRO 4000, with generation climbing from 7.5 tok/sec to 26 tok/sec.

#llama.cpp #qwen #local-llm

LLM Reddit Mar 20, 2026 2 min read

r/LocalLLaMA Tries to Standardize Practical Qwen3.5 Presets

A few weeks after release, r/LocalLLaMA is converging on task-specific sampler and reasoning-budget presets for Qwen3.5 rather than one default setup.

#qwen #llama.cpp #local-llm

LLM Reddit Mar 20, 2026 2 min read

LocalLLaMA Boosts a Community Qwen 3.5 9B GGUF Merge for Low-Refusal Local Use

A popular r/LocalLLaMA post highlighted a community merge of uncensored and reasoning-distilled Qwen 3.5 9B checkpoints, underscoring the appetite for behavior-tuned small local models.

#qwen #gguf #distillation

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Benchmark Argues RTX PRO 6000 SM120 Is Being Held Back by Broken CUTLASS NVFP4 MoE Kernels

A March 12, 2026 LocalLLaMA benchmark post claims the best sustained decode for Qwen3.5-397B NVFP4 on 4x RTX PRO 6000 Blackwell GPUs is 50.5 tok/s with Marlin, because native CUTLASS grouped GEMM paths on SM120 fail or fall back.

#qwen #blackwell #vllm

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Tracks OmniCoder-9B's Push Into Small Coding Agents

A LocalLLaMA release post presents OmniCoder-9B as a Qwen3.5-9B-based coding agent fine-tuned on 425,000-plus agentic trajectories, with commenters focusing on its read-before-write behavior and usefulness at small model size.

#coding-agents #qwen #open-weights

LLM Reddit Mar 15, 2026 2 min read

LocalLLaMA Patch Claims Faster Qwen3.5-397B Inference on Blackwell Workstations With a K=64 Kernel Fix

A March 14, 2026 LocalLLaMA post outlined a CUTLASS and FlashInfer patch for SM120 Blackwell workstations, claiming major gains for Qwen3.5-397B NVFP4 inference and linking the work to FlashInfer PR #2786.

#qwen #blackwell #vllm

LLM Reddit Mar 15, 2026 2 min read

r/LocalLLaMA: Qwen 3.5 27B Hits ~2000 TPS in a Document-Classification Setup

A r/LocalLLaMA field report showed how a very specific local inference workload was tuned for throughput. The author reported about 2,000 tokens per second while classifying markdown documents with Qwen 3.5 27B, and the comment thread turned the post into a practical optimization discussion.

#qwen #localllm #llama-cpp

LLM Reddit Mar 14, 2026 3 min read

LocalLLaMA Highlights a 14B Ada Coding Model Tuned for Safety-Critical Software Workflows

A LocalLLaMA post claims a QLoRA-tuned 14B Qwen coder model can beat frontier proprietary models on Ada compilation tasks, reviving interest in domain-specific coding models for niche but high-stakes languages.

#ada #code-generation #fine-tuning

LLM Reddit Mar 13, 2026 1 min read

OmniCoder-9B Brings Frontier Agent Traces to a 9B Open Coding Model

OmniCoder-9B packages agent-style coding behavior into a smaller open model by training on more than 425,000 curated trajectories from real tool-using workflows.

#coding-agents #open-models #qwen

LLM Reddit Mar 12, 2026 2 min read

Reddit Flags a New llama.cpp Metal Speedup for Qwen 3.5 on Mac

A r/LocalLLaMA post pointed Mac users to llama.cpp pull request #20361, merged on March 11, 2026, adding a fused GDN recurrent Metal kernel. The PR shows around 12-36% throughput gains on Qwen 3.5 variants, while Reddit commenters noted the change is merged but can still trail MLX on some local benchmarks.

#llama.cpp #qwen #apple-silicon

LLM Reddit Mar 11, 2026 1 min read

r/MachineLearning Elevates a 2x 4090 LLM Layer-Duplication Experiment

A high-scoring r/MachineLearning post resurfaced David Noel Ng's long-form write-up, centering on the claim that duplicating a seven-layer middle block in Qwen2-72B, without changing weights, was enough to reach the top of the open leaderboard.

#llm-research #qwen #leaderboard