LLM

LLM Hacker News Mar 25, 2026 2 min read

Hacker News spots Hypura running oversized LLMs on Macs with tier-aware scheduling

Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.

#apple-silicon #llm-inference #memory-scheduling

LLM Hacker News Mar 25, 2026 2 min read

Hacker News highlights TurboQuant's 3-bit KV-cache compression without retraining

Hacker News picked up Google Research's TurboQuant because it promises 3-bit KV-cache compression without fine-tuning while targeting both vector search and long-context inference.

#turboquant #quantization #kv-cache

LLM Hacker News Mar 25, 2026 2 min read

Hacker News flags compromised LiteLLM PyPI releases that execute on Python startup

Hacker News amplified BerriAI's warning that malicious LiteLLM PyPI releases could execute before import, turning a package update into immediate incident response.

#litellm #pypi #supply-chain-security

LLM Mar 25, 2026 2 min read

Google Previews Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google introduced Gemini 3.1 Flash-Lite on Mar 03, 2026 as its fastest and lowest-cost Gemini 3 series model. The preview release targets high-volume developer workloads with lower pricing, faster latency, and stronger benchmark scores than the prior 2.5 Flash tier.

#google #gemini #llm

LLM Mar 25, 2026 2 min read

Anthropic Ships Claude Sonnet 4.6 With 1M-Token Context in Beta

Anthropic introduced Claude Sonnet 4.6 on Feb 17, 2026 as its most capable Sonnet model yet. The release combines a 1M token context window in beta with upgrades to coding, computer use, and agent workflows while keeping Sonnet 4.5 pricing.

#anthropic #claude #llm

LLM Reddit Mar 25, 2026 1 min read

LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

LocalLLaMA surfaced an MIT-licensed GigaChat 3.1 release that pairs a 702B MoE model for clusters with a 10B MoE model aimed at faster deployment and lighter inference.

#gigachat #open-weights #multilingual

LLM Reddit Mar 25, 2026 2 min read

LocalLLaMA warns of compromised LiteLLM PyPI releases that ran before import

A LocalLLaMA alert pushed a serious LiteLLM supply-chain incident into view after compromised PyPI wheels were reported to execute a credential stealer on Python startup.

#litellm #pypi #supply-chain-security

LLM Hacker News Mar 25, 2026 1 min read

Hacker News highlights a practical video-search CLI built on Gemini Embedding 2

Show HN users were drawn to SentrySearch because it turns Gemini Embedding 2's native video embeddings into a practical CLI for semantic search and clip extraction.

#gemini #video-search #embeddings

LLM Mar 24, 2026 2 min read

Google DeepMind Proposes a Cognitive Framework for Measuring AGI Progress

Google DeepMind has published a cognitive taxonomy for evaluating progress toward AGI and paired it with a Kaggle hackathon to build new benchmarks. The framework maps AI systems against human baselines across 10 cognitive abilities instead of relying on a single headline score.

#deepmind #agi #benchmarks

LLM X/Twitter Mar 24, 2026 2 min read

Anthropic Economic Index says experienced Claude users iterate more and rely less on full autonomy

Anthropic said in a March 24, 2026 X update that longer-term Claude users iterate more carefully, rely less on full autonomy, and take on higher-value tasks more successfully. The company framed experience as a shift toward guided, higher-leverage workflows rather than simple one-shot delegation.

#anthropic #claude #economic-index

LLM Reddit Mar 24, 2026 2 min read

r/singularity treats Anthropic Dispatch as the next step toward phone-first AI coworkers

r/singularity read Anthropic's Dispatch + computer use release as a real product shift toward phone-first AI coworkers, while also focusing on the macOS-only rollout and the limits of screen-driven automation.

#claude #computer-use #mobile

LLM Hacker News Mar 24, 2026 2 min read

Hacker News turns the LiteLLM breach into a warning about AI supply-chain risk

A fast-moving HN thread used the LiteLLM incident to make a broader point: AI developer infrastructure now carries the same supply-chain risk as cloud infra, but often with looser dependency discipline and a larger secret surface.

#litellm #supply-chain-security #pypi