Why it matters: retrieval stacks are being pulled from text-only search into multimodal memory. Google AI Studio said Gemini Embedding 2 is generally available and covers text, image, video, audio, and documents through one model path.
#retrieval
RSS FeedWhy it matters: search products need factuality and citations, not just fluent answers. Perplexity said its SFT + RL pipeline lets Qwen models match or beat GPT models on factuality at lower cost.
On April 10, 2026, Databricks AI Research published Memory Scaling for AI Agents, arguing that agent performance can improve as external memory grows. The post reports gains in both accuracy and efficiency from labeled examples, raw conversation logs, and organizational knowledge.
Mintlify says chunked RAG was too limited for docs exploration, so it built ChromaFs, a virtual filesystem over Chroma that cuts assistant session creation from about 46 seconds to about 100ms. HN readers were notably receptive to the filesystem-first design and the argument that agent tooling benefits from interpretable, UNIX-like retrieval.
Hacker News picked up a DuckDB community extension that fixes filtered HNSW search and adds aggressive vector compression, making retrieval workloads more predictable under real SQL filters.
Google AI Studio promoted Gemini Embedding 2 in a March 12, 2026 X post, and Google’s March 10 blog post says the model maps text, images, video, audio, and documents into a single embedding space. Google says it is in public preview through the Gemini API and Vertex AI and is designed for multimodal retrieval and classification.
A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.
A post in r/artificial argues that long-running agents may need decay, reinforcement, and selective forgetting more than another vector database, prompting a discussion about episodic memory, compression, and retrieval quality.
A Hacker News discussion around Amine Raji's local ChromaDB lab highlights a practical risk in RAG systems: attackers can win by contaminating the source corpus, and the strongest defense may sit at ingestion rather than in the prompt.
Perplexity announced on February 26, 2026 that `pplx-embed-v1` and `pplx-embed-context-v1` are now available in 0.6B and 4B variants. The company positions the release as retrieval-first infrastructure with quantized embeddings and benchmark-focused performance claims.