Open WebUI’s Open Terminal gives local models a real execution environment
Original: Open WebUI’s New Open Terminal + “Native” Tool Calling + Qwen3.5 35b = Holy Sh!t!!! View original →
Reddit thread: LocalLLaMA discussion
Project: open-webui/open-terminal
Docs: Open Terminal documentation
This LocalLLaMA post captures why Open WebUI’s new Open Terminal feature is getting attention. It gives a model an actual operating system to work inside, not just a chat box with a few toy tools. In Docker mode, the agent gets a sandboxed Linux environment with a file browser and live previews. In bare metal mode, it can operate directly on the user’s machine and project directory.
What stands out
- The model can install packages, run shell commands, edit files, and generate outputs that appear immediately in the browser.
- The docs list a broad preinstalled stack, including Python 3.12, git, build tools, jq, sqlite3, pandas, scikit-learn, matplotlib, and more.
- The Reddit post says Qwen3.5 35B A3B worked well as long as “Native” tool calling was enabled, and the default Docker setup keeps a persistent working volume between chats.
That combination matters because it reduces the gap between local open models and products such as Claude Code. Instead of wiring separate plugins, the user gets a single environment where the model can inspect files, execute steps, recover from errors, and hand back artifacts through the same interface. The post also notes that enterprise multi-user “Terminals” are in progress, which points to a team-oriented version of the same workflow.
The obvious caveat is safety. The Open WebUI docs are blunt that bare metal mode gives the AI the user’s real permissions, so Docker is the safer default. Still, the broader takeaway is clear: local models become much more useful once tool calling, filesystem access, and execution are treated as first-class features instead of afterthoughts.
Related Articles
A high-scoring r/MachineLearning post resurfaced David Noel Ng's long-form write-up, centering on the claim that duplicating a seven-layer middle block in Qwen2-72B, without changing weights, was enough to reach the top of the open leaderboard.
A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.
A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.
Comments (0)
No comments yet. Be the first to comment!