Google put Gemini Embedding 2 into public preview on March 10, 2026. The company says the model handles text, images, and mixed multimodal documents in one embedding space while improving benchmark scores to 68.32 for text and 53.3 for image tasks without changing price or vector dimensions.
LLM
RSS FeedGoogle DeepMind updated Gemini 3.1 Flash-Lite on March 3, 2026 as a low-cost model for high-volume, low-latency work. Google says it supports 128k input, 8k output, multimodal input, native audio generation, and pricing from $0.10 per 1M input tokens.
A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.
A LocalLLaMA thread amplified Phoronix coverage of GreenBoost, an experimental GPLv2 Linux module that adds a multi-tier memory path for NVIDIA GPUs. The design pairs a kernel module with a CUDA shim so large allocations can spill from limited on-card vRAM into pinned system RAM and NVMe-backed storage without modifying CUDA applications.
On March 11, 2026, OpenAI published new guidance on designing AI agents to resist prompt injection, framing untrusted emails, web pages, and other inputs as a core security boundary. The company says robust agents separate data from instructions, minimize privileges, and require monitoring and user confirmation before taking consequential actions.
GitHub said on March 10, 2026 that GitHub Copilot, VS Code, and Figma now form a continuous loop through the bidirectional Figma MCP server. GitHub’s March 6 changelog says users can pull design context into code and send rendered UI back to Figma as editable frames.
Perplexity said on March 13, 2026 that Perplexity Computer is now available on mobile, starting with iOS inside the Perplexity app. Coming one day after the company opened Computer to Pro subscribers, the update turns the product into a more explicit cross-device agent workflow rather than a desktop-only experience.
A March 15, 2026 LocalLLaMA post pointed to Hugging Face model-card commits and NVIDIA license pages showing Nemotron Super 3 models moving from the older NVIDIA Open Model License text to the newer NVIDIA Nemotron Open Model License.
A March 14, 2026 Hacker News discussion highlighted a more nuanced MCP argument: local stdio MCP can be unnecessary overhead for bespoke tools, while remote HTTP MCP still solves auth, telemetry, and shared tooling at team scale.
On March 9, 2026, OpenAI said it plans to acquire Promptfoo and integrate its AI security tooling into OpenAI Frontier. The move pushes security testing, red-teaming, and governance closer to the default workflow for enterprise agents.
OpenAI said on March 5, 2026 that GPT-5.4 Thinking shows low Chain-of-Thought controllability, which for now strengthens CoT monitoring as a safety signal. The release pairs an X post with a new open-source evaluation suite and research paper.
Community discussion in LocalLLaMA pointed to a March 11, 2026 FastFlowLM and Lemonade update that brings Linux support to AMD XDNA 2 NPUs, including setup guidance for Ubuntu and Arch systems.