LLM

LLM Hacker News Mar 25, 2026 2 min read

Hacker News spots Hypura running oversized LLMs on Macs with tier-aware scheduling

Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.

#apple-silicon #llm-inference #memory-scheduling

LLM Hacker News Mar 25, 2026 2 min read

Hacker News flags compromised LiteLLM PyPI releases that execute on Python startup

Hacker News amplified BerriAI's warning that malicious LiteLLM PyPI releases could execute before import, turning a package update into immediate incident response.

#litellm #pypi #supply-chain-security

LLM Mar 25, 2026 2 min read

Google Previews Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google introduced Gemini 3.1 Flash-Lite on Mar 03, 2026 as its fastest and lowest-cost Gemini 3 series model. The preview release targets high-volume developer workloads with lower pricing, faster latency, and stronger benchmark scores than the prior 2.5 Flash tier.

#google #gemini #llm

LLM Reddit Mar 25, 2026 1 min read

LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

LocalLLaMA surfaced an MIT-licensed GigaChat 3.1 release that pairs a 702B MoE model for clusters with a 10B MoE model aimed at faster deployment and lighter inference.

#gigachat #open-weights #multilingual

LLM Reddit Mar 25, 2026 2 min read

LocalLLaMA warns of compromised LiteLLM PyPI releases that ran before import

A LocalLLaMA alert pushed a serious LiteLLM supply-chain incident into view after compromised PyPI wheels were reported to execute a credential stealer on Python startup.

#litellm #pypi #supply-chain-security

LLM Hacker News Mar 25, 2026 1 min read

Hacker News highlights a practical video-search CLI built on Gemini Embedding 2

Show HN users were drawn to SentrySearch because it turns Gemini Embedding 2's native video embeddings into a practical CLI for semantic search and clip extraction.

#gemini #video-search #embeddings

LLM Mar 24, 2026 2 min read

Google DeepMind Proposes a Cognitive Framework for Measuring AGI Progress

Google DeepMind has published a cognitive taxonomy for evaluating progress toward AGI and paired it with a Kaggle hackathon to build new benchmarks. The framework maps AI systems against human baselines across 10 cognitive abilities instead of relying on a single headline score.

#deepmind #agi #benchmarks

LLM Reddit Mar 24, 2026 2 min read

r/singularity treats Anthropic Dispatch as the next step toward phone-first AI coworkers

r/singularity read Anthropic's Dispatch + computer use release as a real product shift toward phone-first AI coworkers, while also focusing on the macOS-only rollout and the limits of screen-driven automation.

#claude #computer-use #mobile

LLM Hacker News Mar 24, 2026 2 min read

Hacker News turns the LiteLLM breach into a warning about AI supply-chain risk

A fast-moving HN thread used the LiteLLM incident to make a broader point: AI developer infrastructure now carries the same supply-chain risk as cloud infra, but often with looser dependency discipline and a larger secret surface.

#litellm #supply-chain-security #pypi

LLM Mar 24, 2026 2 min read

NVIDIA introduces OpenShell, a runtime-level security layer for autonomous agents

NVIDIA introduced OpenShell on March 23, 2026. The company says the open source runtime isolates each autonomous agent in its own sandbox and keeps policy enforcement at the infrastructure layer instead of relying only on model or application safeguards.

#nvidia #agents #security

LLM Mar 24, 2026 2 min read

Microsoft Research unveils Phi-4-reasoning-vision-15B to push multimodal reasoning efficiency

Microsoft Research announced the 15 billion parameter open-weight model Phi-4-reasoning-vision-15B on March 4, 2026. The lab says the release is designed to deliver stronger multimodal reasoning, math and science performance, and computer-use ability without the compute profile of much larger systems.

#microsoft #phi-4 #multimodal

LLM Reddit Mar 24, 2026 1 min read

LocalLLaMA highlights FlashAttention-4 gains on Blackwell and the limits for everyday GPUs

A technical LocalLLaMA thread translated the FlashAttention-4 paper into practical deployment guidance, emphasizing huge Blackwell gains, faster Python-based kernel development, and the fact that most A100 or consumer-GPU users cannot use the full benefits yet.

#flashattention #inference #gpu