A r/LocalLLaMA post pointed Mac users to llama.cpp pull request #20361, merged on March 11, 2026, adding a fused GDN recurrent Metal kernel. The PR shows around 12-36% throughput gains on Qwen 3.5 variants, while Reddit commenters noted the change is merged but can still trail MLX on some local benchmarks.
#apple-silicon
LLM Reddit 13h ago 2 min read
LLM Hacker News 1d ago 2 min read
A Launch HN thread pulled RunAnywhere’s MetalRT and RCLI into focus, centering attention on a low-latency STT-LLM-TTS stack that runs on Apple Silicon without cloud APIs.
LLM Hacker News 2d ago 2 min read
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
LLM Reddit 2d ago 2 min read
A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.
LLM Hacker News 5d ago 1 min read
Running Nvidia PersonaPlex 7B in Swift on Apple Silicon moves local voice agents closer to real time
An HN post on a Swift/MLX port of Nvidia PersonaPlex 7B shows how chunking, buffering, and interrupt handling matter as much as raw model quality for local speech-to-speech agents.