LLM Reddit Mar 22, 2026 2 min read
A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.