Local tool calling hit LocalLLaMA’s reality check: model, quant, or harness?

Community Spark

A r/LocalLLaMA thread asked whether local tool calling is real or a collective prank, and the question landed because many users have felt the same failure mode. The poster described Open WebUI with Terminal in Docker and models served through LM Studio, then listed Qwen3.5 27B/35B, Gemma4 26B, Qwen3.6 35B and GPT-OSS 20B as models that struggled to create a simple file reliably.

What The Community Blamed First

The most useful replies did not stop at “local models are bad.” Several users pointed at OpenWebUI as the weak link and said OpenCode, Cline in VSCode, llama.cpp or LM Studio’s own runtime had produced better results. One reply said OpenWebUI is fine for chat but weaker for newer models that depend on native tool-call fields and separate reasoning fields. Another said OpenCode had been working well for coding-oriented local tool use.

The Debug Checklist

The thread produced a practical set of variables: avoid very aggressive quants when testing tool use, confirm native tool calling is enabled, check whether the harness returns reasoning in the expected API field, and make sure the tool schema matches what the model has learned. Users also noted that asynchronous shell commands can confuse some wrappers even when the same model behaves better in a coding-specific agent.

Why It Matters

Local agents are often discussed as a model leaderboard problem, but this thread shows the stack is the product. A strong Qwen or Gemma run can still fail if the UI wrapper mishandles tool-call JSON, strips reasoning incorrectly, or keeps the model in an execution loop. The operational lesson is to log the full setup: model, quant, server, runtime, wrapper, tool mode and task. Without that, “local tool calling works” and “local tool calling is broken” are both too vague to be useful.

Source: r/LocalLLaMA discussion.

Local tool calling hit LocalLLaMA’s reality check: model, quant, or harness?

Community Spark

What The Community Blamed First

The Debug Checklist

Why It Matters

Related Articles

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting

LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds

Related Articles

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting
LLM Reddit May 22, 2026 1 min read

LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds
LLM Reddit Feb 26, 2026 2 min read