LLM Hacker News Mar 25, 2026 2 min read
Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.
Hacker News noticed Hypura because it treats Apple Silicon memory limits as a scheduling problem, spreading tensors across GPU, RAM, and NVMe instead of letting oversized models crash.
A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.