A merged MCP PR brings agent loops, resources, and prompts into llama.cpp WebUI
Original: The MCP PR for llama.cpp has been merged ! View original →
Reddit thread: LocalLLaMA discussion
Merged PR: llama.cpp PR #18655
Another LocalLLaMA thread worth tracking is the merge of llama.cpp PR #18655, titled “webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts.” This matters because it brings Model Context Protocol features directly into the llama.cpp WebUI and server workflow instead of leaving that layer to external wrappers.
What the merged PR adds
- MCP server selection and server capability cards.
- Tool calls with an agentic loop and processing statistics.
- Prompt pickers, prompt attachments, resource browsing, preview, and templates.
- A backend CORS proxy via the
--webui-mcp-proxyflag for llama-server.
The pull request also bundles a long list of UI refinements, including better code blocks, collapsible reasoning and tool-call displays, attachment improvements, and message statistics. In other words, this is not just “MCP support” on paper. It is a usability layer for actually driving prompts, files, and resources from the browser.
The strategic importance is that local inference stacks are converging with the agent tooling people previously associated with hosted products. If this matures, llama.cpp users get a more complete path from local model serving to tool-aware workflows, prompt composition, and structured resource access without needing a separate orchestration product as the first step.
Related Articles
LocalLLaMA users are tracking llama.cpp’s merged autoparser work, which analyzes model templates to support reasoning and tool-call formats with less custom parser code.
GitHub used X on March 9, 2026 to resurface its guide to building reliable multi-agent systems. The company argues that most failures come from missing structure, and recommends typed schemas, action schemas, and Model Context Protocol as the core engineering controls.
A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.
Comments (0)
No comments yet. Be the first to comment!