A high-ranking Hacker News thread highlighted an argument that coding agents can remove the biggest cost of literate programming: keeping prose and code in sync. The post points to Org Mode-style runbooks and executable documentation as a more practical fit for AI-assisted software work.
LLM
RSS FeedGitHub Copilot CLI is now generally available, bringing Copilot into the terminal for standard subscribers. GitHub paired the release with broader Copilot changes including next edit suggestions, MCP-enabled agent mode, background agents, and a higher-end Pro+ plan.
A LocalLLaMA thread and linked GitHub issues argue that LlamaIndex's OpenAI-by-default behavior can surprise local-first RAG builders when nested components are created without explicit model injection. Maintainers say the behavior is longstanding and documented, but the discussion is pushing for a stricter fail-fast mode for sovereign deployments.
A high-scoring LocalLLaMA thread surfaced Sarvam AI's release of two Apache 2.0 reasoning models, Sarvam 30B and Sarvam 105B. The company says both were trained from scratch in India, use Mixture-of-Experts designs, and target reasoning, coding, agentic workflows, and Indian-language performance.
A popular Hacker News post highlighted Agent Safehouse, a macOS tool that wraps Claude Code, Codex and similar agents in a deny-first sandbox using sandbox-exec. The project grants project-scoped access by default, blocks sensitive paths at the kernel layer, and ships as a single Bash script under Apache 2.0.
Azure says GPT-5.4 is now available in Microsoft Foundry for production-grade agent workloads. Microsoft’s supporting post adds GPT-5.4 Pro, pricing, and initial deployment options, with governance controls positioned as part of the pitch.
Google AI Developers has released Android Bench, an official leaderboard for LLMs on Android development tasks. In the first results, Gemini 3.1 Pro ranks first, and Google is also publishing the benchmark, dataset, and test harness.
OpenAI Developers has updated its GPT-5.4 API prompting guide. The new guidance focuses on tool use, structured outputs, verification loops, and long-running workflows for production-grade agents.
A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.
A r/LocalLLaMA thread is drawing attention to `llama.cpp` pull request #19504, which adds a `GATED_DELTA_NET` op for Qwen3Next-style models. Reddit users reported better token-generation speed after updating, while the PR itself includes early CPU/CUDA benchmark data.
A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.
Mistral has launched Mistral 3, a new open multimodal family with dense 14B, 8B, and 3B models under Apache 2.0, plus a larger Mistral Large 3. The company says the lineup was trained from scratch and tuned for both Blackwell NVL72 systems and single-node 8xA100 or 8xH100 deployments.