Hacker News pushed Microsoft's bitnet.cpp back into view, treating it less as a new 100B checkpoint and more as an infrastructure play for 1.58-bit inference and lower-power local LLM deployment.
LLM
RSS FeedGoogle AI Developers says Gemini Embedding 2 is now in preview via the Gemini API and Vertex AI. Google describes it as its first fully multimodal embedding model on the Gemini architecture and its most capable embedding model so far.
Microsoft says Fireworks AI is now part of Microsoft Foundry, bringing high-performance, low-latency open-model inference to Azure. The launch emphasizes day-zero access to leading open models, custom-model deployment, and enterprise controls in one place.
A Launch HN thread pulled RunAnywhere’s MetalRT and RCLI into focus, centering attention on a low-latency STT-LLM-TTS stack that runs on Apple Silicon without cloud APIs.
A fast-rising LocalLLaMA post resurfaced David Noel Ng's write-up on duplicating a seven-layer block inside Qwen2-72B, a no-training architecture tweak that reportedly lifted multiple Open LLM Leaderboard benchmarks.
A prominent r/MachineLearning thread highlighted arXiv 2603.01919, which audits shadow APIs claiming GPT-5 and Gemini-2.5 access and reports large performance drift, unstable safety behavior, and frequent identity-verification failures.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
Google DeepMind said Gemini 3.1 Flash-Lite is rolling out in preview through the Gemini API and Google AI Studio. The company positioned it as the most cost-efficient Gemini 3 model, with lower price, faster performance, and tunable thinking levels.
Claude said Claude Code now includes Code Review, a feature that dispatches multiple agents on every pull request. Anthropic says the feature is in research preview for Team and Enterprise, with depth-first reviews rather than lightweight skims.
A LocalLLaMA post pointed to a new Hugging Face dataset of human-written code reviews, pairing before-and-after code changes with inline reviewer comments and negative examples across 37 languages.
A Reddit post drew attention to a March 2 case study arguing that OpenClaw incidents already trigger 8 of 10 OWASP Agentic vulnerability classes, including malicious skill supply-chain attacks and localhost WebSocket hijacking.
Perplexity’s Computer account used X on March 9, 2026 to demonstrate Claude Code and GitHub CLI running directly inside Perplexity Computer. In the public demo, the system forked an Openclaw repository, planned a fix, implemented the change, and submitted a pull request from inside the Computer environment.