Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.
LLM
RSS FeedHacker News amplified BerriAI's warning that malicious LiteLLM PyPI releases could execute before import, turning a package update into immediate incident response.
Google introduced Gemini 3.1 Flash-Lite on Mar 03, 2026 as its fastest and lowest-cost Gemini 3 series model. The preview release targets high-volume developer workloads with lower pricing, faster latency, and stronger benchmark scores than the prior 2.5 Flash tier.
LocalLLaMA surfaced an MIT-licensed GigaChat 3.1 release that pairs a 702B MoE model for clusters with a 10B MoE model aimed at faster deployment and lighter inference.
A LocalLLaMA alert pushed a serious LiteLLM supply-chain incident into view after compromised PyPI wheels were reported to execute a credential stealer on Python startup.
Show HN users were drawn to SentrySearch because it turns Gemini Embedding 2's native video embeddings into a practical CLI for semantic search and clip extraction.
Google DeepMind has published a cognitive taxonomy for evaluating progress toward AGI and paired it with a Kaggle hackathon to build new benchmarks. The framework maps AI systems against human baselines across 10 cognitive abilities instead of relying on a single headline score.
r/singularity read Anthropic's Dispatch + computer use release as a real product shift toward phone-first AI coworkers, while also focusing on the macOS-only rollout and the limits of screen-driven automation.
A fast-moving HN thread used the LiteLLM incident to make a broader point: AI developer infrastructure now carries the same supply-chain risk as cloud infra, but often with looser dependency discipline and a larger secret surface.
NVIDIA introduced OpenShell on March 23, 2026. The company says the open source runtime isolates each autonomous agent in its own sandbox and keeps policy enforcement at the infrastructure layer instead of relying only on model or application safeguards.
Microsoft Research announced the 15 billion parameter open-weight model Phi-4-reasoning-vision-15B on March 4, 2026. The lab says the release is designed to deliver stronger multimodal reasoning, math and science performance, and computer-use ability without the compute profile of much larger systems.
A technical LocalLLaMA thread translated the FlashAttention-4 paper into practical deployment guidance, emphasizing huge Blackwell gains, faster Python-based kernel development, and the fact that most A100 or consumer-GPU users cannot use the full benefits yet.