A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.
LLM
Mistral has launched Mistral 3, a new open multimodal family with dense 14B, 8B, and 3B models under Apache 2.0, plus a larger Mistral Large 3. The company says the lineup was trained from scratch and tuned for both Blackwell NVL72 systems and single-node 8xA100 or 8xH100 deployments.
A merged llama.cpp PR adds MCP server selection, tool calls, prompts, resources, and an agentic loop to the WebUI stack, moving local inference closer to full agent workflows.
A high-scoring LocalLLaMA post highlights Open WebUI’s Open Terminal: a Docker or bare-metal execution layer that lets local models run commands, edit files, and return artifacts through chat.
LocalLLaMA users are tracking llama.cpp’s merged autoparser work, which analyzes model templates to support reasoning and tool-call formats with less custom parser code.
Running Nvidia PersonaPlex 7B in Swift on Apple Silicon moves local voice agents closer to real time
An HN post on a Swift/MLX port of Nvidia PersonaPlex 7B shows how chunking, buffering, and interrupt handling matter as much as raw model quality for local speech-to-speech agents.
Google’s February Gemini update packages Gemini 3.1 Pro, Deep Think, Nano Banana 2, Veo templates, and new Canvas tools into one release. The drop shows Google pushing the Gemini app as a front end for reasoning, image, music, and video workflows rather than a plain chat surface.
OpenAI and Amazon said AWS customers will get a Stateful Runtime Environment in Amazon Bedrock for production-grade agent workflows. The announcement moves agent execution closer to managed AWS infrastructure with persistent state, governance, and long-running workflow support.
Google DeepMind said on March 3, 2026 that Gemini 3.1 Flash-Lite delivers faster performance at a lower price than Gemini 2.5 Flash. Google is rolling the model out in preview via Google AI Studio and Vertex AI for high-volume, latency-sensitive workloads.
OpenAI Developers said on March 6, 2026 that Codex Security is now in research preview. The product connects to GitHub repositories, builds a threat model, validates potential issues in isolation, and proposes patches for human review.
A Hacker News thread surfaced OBLITERATUS, an open-source project that studies and alters refusal behavior in open-weight LLMs without retraining. The interesting part is not just the capability claim but the project’s framing as a shared telemetry-backed research pipeline for comparing safety-editing methods across models and hardware.
A well-received HN post highlighted Sarvam AI’s decision to open-source Sarvam 30B and 105B, two reasoning-focused MoE models trained in India under the IndiaAI mission. The announcement matters because it pairs open weights with concrete product deployment, inference optimization, and unusually strong Indian-language benchmarks.