#nvme

LLM Hacker News Mar 25, 2026 2 min read

Hacker News spots Hypura running oversized LLMs on Macs with tier-aware scheduling

Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.

#apple-silicon #llm-inference #memory-scheduling

LLM Hacker News Feb 22, 2026 1 min read

Running Llama 3.1 70B on a Single RTX 3090 via NVMe-to-GPU

A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.

#llama #gpu #open-source