A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.
#reasoning
RSS FeedOpenAI said on March 5, 2026 that GPT-5.4 Thinking and GPT-5.4 Pro were rolling out in ChatGPT, while GPT-5.4 also became available in the API and Codex. OpenAI’s launch page positions GPT-5.4 as a unified frontier model for reasoning, coding, native computer use, and long-horizon agent workflows.
A Show HN repo claims that duplicating a few LLM layers can improve reasoning without training or weight changes. The underlying README, however, shows real tradeoffs, making this more convincing as capability steering than as a universal model upgrade.
The LocalLLaMA discussion around NVIDIA’s new model focused on an unusual mix of scale efficiency and benchmark ambition: 30B total parameters, 3B activated, plus separate thinking and instruct modes.
A March 16, 2026 r/LocalLLaMA post about Mistral Small 4 reached 606 points and 232 comments in the latest available crawl. Mistral’s model card describes a 119B-parameter MoE with 4 active experts, 256k context, multimodal input, and a per-request switch between standard and reasoning modes.
On March 16, 2026, a r/LocalLLaMA link to Mistral Small 4 reached 504 points and 196 comments. The Hugging Face model card describes a 119B MoE with 4 active experts, 256k context, multimodal input, and per-request reasoning control.
OpenAI said on February 20, 2026 that its theorem-proving model produced proof attempts for all 10 research-level First Proof problems. After expert feedback, the company believes at least five attempts are likely correct, while some remain under review and the attempt for problem 2 now appears incorrect.
OpenAI said on March 5, 2026 that GPT-5.4 Thinking shows low Chain-of-Thought controllability, which for now strengthens CoT monitoring as a safety signal. The release pairs an X post with a new open-source evaluation suite and research paper.
The arXiv paper Ares, submitted on March 9, 2026, proposes dynamic per-step reasoning selection for multi-step LLM agents. The authors report up to 52.7% lower reasoning token usage versus fixed high-effort settings with only minimal drops in task success.
OpenAI said on March 5, 2026 that GPT-5.4 is rolling out across ChatGPT, the API, and Codex. The new model combines GPT-5.3-Codex coding capability with OpenAI’s mainline reasoning stack, adds native computer-use features, and introduces experimental 1M-token context in Codex.
OpenAI introduced a new evaluation suite and research paper on Chain-of-Thought controllability. The company says GPT-5.4 Thinking shows low ability to obscure its reasoning, which supports continued use of CoT monitoring as a safety signal.
A new llama.cpp change turns <code>--reasoning-budget</code> into a real sampler-side limit instead of a template stub. The LocalLLaMA thread focused on the tradeoff between cutting long think loops and preserving answer quality, especially for local Qwen 3.5 deployments.