#model-efficiency

LLM Hacker News Mar 26, 2026 2 min read

A ground-up quantization guide clarifies where LLM cost really lives

ngrok’s March 25, 2026 explainer lays out how quantization can make LLMs roughly 4x smaller and 2x faster, and what the real 4-bit versus 8-bit tradeoff looks like. Hacker News drove the post to 247 points and 46 comments, reopening the discussion around memory bottlenecks and the economics of local inference.

#quantization #llm #inference

LLM X/Twitter Mar 20, 2026 2 min read

OpenAI launches Parameter Golf to push efficient pretraining under a 16 MB cap

OpenAI said on X that it is launching Parameter Golf, an open research challenge to build the most efficient pretrained model under a 16 MB artifact limit and a 10-minute training budget on 8×H100s. The challenge uses a fixed FineWeb dataset, a public baseline repo, and optional Runpod credits for participants.

#openai #parameter-golf #model-efficiency