#llama.cpp

LLM Reddit 12h ago 2 min read

Reddit Flags a New llama.cpp Metal Speedup for Qwen 3.5 on Mac

A r/LocalLLaMA post pointed Mac users to llama.cpp pull request #20361, merged on March 11, 2026, adding a fused GDN recurrent Metal kernel. The PR shows around 12-36% throughput gains on Qwen 3.5 variants, while Reddit commenters noted the change is merged but can still trail MLX on some local benchmarks.

#llama.cpp #qwen #apple-silicon

LLM Reddit 1d ago 1 min read

r/LocalLLaMA Tracks llama.cpp's New Reasoning Budget Controls

A new llama.cpp change turns <code>--reasoning-budget</code> into a real sampler-side limit instead of a template stub. The LocalLLaMA thread focused on the tradeoff between cutting long think loops and preserving answer quality, especially for local Qwen 3.5 deployments.

#llama.cpp #reasoning #local-llms

LLM Reddit 4d ago 2 min read

LocalLLaMA shares a llama.cpp tuning tip: smaller n_ubatch unlocked much faster Qwen 27B prompt processing

A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.

#llama.cpp #qwen #rocm

LLM Reddit 4d ago 2 min read

LocalLLaMA flags a merged llama.cpp update for Qwen-family inference

A r/LocalLLaMA thread is drawing attention to `llama.cpp` pull request #19504, which adds a `GATED_DELTA_NET` op for Qwen3Next-style models. Reddit users reported better token-generation speed after updating, while the PR itself includes early CPU/CUDA benchmark data.

#llama.cpp #qwen #qwen-next

LLM Hacker News 4d ago 2 min read

Qwen 3.5 local guide maps out memory budgets, 256K context, and llama.cpp setup

A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.

#qwen #llama.cpp #local-llm

LLM Reddit 5d ago 1 min read

A merged MCP PR brings agent loops, resources, and prompts into llama.cpp WebUI

A merged llama.cpp PR adds MCP server selection, tool calls, prompts, resources, and an agentic loop to the WebUI stack, moving local inference closer to full agent workflows.

#llama.cpp #mcp #webui

LLM Reddit 5d ago 1 min read

llama.cpp’s automatic parser generator aims to reduce model-specific parser work

LocalLLaMA users are tracking llama.cpp’s merged autoparser work, which analyzes model templates to support reasoning and tool-call formats with less custom parser code.

#llama.cpp #structured-output #parser-generator