LLM

LLM X/Twitter Mar 9, 2026 1 min read

Azure brings Phi-4-Reasoning-Vision-15B to Microsoft Foundry for multimodal reasoning

Azure says Phi-4-Reasoning-Vision-15B is now available in Microsoft Foundry. Microsoft positions the 15B model as a compact multimodal system that can switch reasoning on or off for document analysis, chart understanding, and GUI-grounded agent workflows.

#microsoft #phi-4 #vision-reasoning

LLM X/Twitter Mar 9, 2026 2 min read

Karpathy open-sources autoresearch for autonomous single-GPU nanochat experiments

Andrej Karpathy has published autoresearch, a minimal repo that lets AI agents iterate on a stripped-down nanochat training loop overnight. The project turns agent evaluation into a closed-loop research workflow with fixed 5-minute runs, Git branches, and validation-loss-based selection.

#karpathy #agents #open-source

LLM X/Twitter Mar 9, 2026 1 min read

OpenAI rolls out GPT-5.4 Thinking and GPT-5.4 Pro across ChatGPT, API, and Codex

OpenAI says GPT-5.4 Thinking is shipping in ChatGPT, with GPT-5.4 also live in the API and Codex and GPT-5.4 Pro available for harder tasks. The launch packages reasoning, coding, and native computer use into a single professional-work model with up to 1M tokens of context.

#openai #gpt-5.4 #chatgpt

LLM Hacker News Mar 9, 2026 2 min read

HN Debate: Literate Programming May Fit Better in the Agent Era

A high-ranking Hacker News thread highlighted an argument that coding agents can remove the biggest cost of literate programming: keeping prose and code in sync. The post points to Org Mode-style runbooks and executable documentation as a more practical fit for AI-assisted software work.

#agents #literate-programming #org-mode

LLM Mar 9, 2026 2 min read

GitHub Copilot CLI reaches GA for terminal-native coding workflows

GitHub Copilot CLI is now generally available, bringing Copilot into the terminal for standard subscribers. GitHub paired the release with broader Copilot changes including next edit suggestions, MCP-enabled agent mode, background agents, and a higher-end Pro+ plan.

#github #copilot #cli

LLM Reddit Mar 9, 2026 2 min read

LocalLLaMA flags LlamaIndex's OpenAI defaults as a risk for air-gapped RAG setups

A LocalLLaMA thread and linked GitHub issues argue that LlamaIndex's OpenAI-by-default behavior can surprise local-first RAG builders when nested components are created without explicit model injection. Maintainers say the behavior is longstanding and documented, but the discussion is pushing for a stricter fail-fast mode for sovereign deployments.

#llamaindex #local-rag #privacy

LLM Reddit Mar 9, 2026 2 min read

Sarvam open-sources 30B and 105B reasoning models trained in India

A high-scoring LocalLLaMA thread surfaced Sarvam AI's release of two Apache 2.0 reasoning models, Sarvam 30B and Sarvam 105B. The company says both were trained from scratch in India, use Mixture-of-Experts designs, and target reasoning, coding, agentic workflows, and Indian-language performance.

#open-models #india #reasoning-models

LLM Hacker News Mar 9, 2026 2 min read

Agent Safehouse brings deny-first macOS sandboxing to local coding agents

A popular Hacker News post highlighted Agent Safehouse, a macOS tool that wraps Claude Code, Codex and similar agents in a deny-first sandbox using sandbox-exec. The project grants project-scoped access by default, blocks sensitive paths at the kernel layer, and ships as a single Bash script under Apache 2.0.

#llm-agents #macos #sandboxing

LLM X/Twitter Mar 8, 2026 2 min read

Azure adds GPT-5.4 to Microsoft Foundry for production-grade agent workloads

Azure says GPT-5.4 is now available in Microsoft Foundry for production-grade agent workloads. Microsoft’s supporting post adds GPT-5.4 Pro, pricing, and initial deployment options, with governance controls positioned as part of the pitch.

#azure #microsoft-foundry #gpt-5.4

LLM X/Twitter Mar 8, 2026 1 min read

Google releases Android Bench to measure LLM performance on Android development

Google AI Developers has released Android Bench, an official leaderboard for LLMs on Android development tasks. In the first results, Gemini 3.1 Pro ranks first, and Google is also publishing the benchmark, dataset, and test harness.

#google #android #benchmark

LLM X/Twitter Mar 8, 2026 1 min read

OpenAI updates GPT-5.4 prompting guidance for more reliable agents

OpenAI Developers has updated its GPT-5.4 API prompting guide. The new guidance focuses on tool use, structured outputs, verification loops, and long-running workflows for production-grade agents.

#openai #gpt-5.4 #prompting

LLM Reddit Mar 8, 2026 2 min read

LocalLLaMA shares a llama.cpp tuning tip: smaller n_ubatch unlocked much faster Qwen 27B prompt processing

A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.

#llama.cpp #qwen #rocm