The post promised a zero-state optimizer with low VRAM overhead, and r/MachineLearning answered the way that community usually does: show the rule, show more seeds, and bring harder tasks.
#training
RSS FeedWhy it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.
GitHub said that starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve AI models unless users opt out. Business and Enterprise plans are excluded, but the change materially expands how individual-tier Copilot usage can feed back into model development.
A March 17, 2026 r/MachineLearning post about Clip to Grok reached 56 points and 20 comments at crawl time. The authors report that per-row L2 clipping after each optimizer step cut grokking delay by 18x to 66x on modular arithmetic benchmarks.
A March 19, 2026 Hacker News post about NanoGPT Slowrun reached 162 points and 43 comments at crawl time. Q Labs says an ensemble of 1.8B-parameter models trained on 100M tokens matched a baseline that would normally require 1B tokens.
Q Labs says 100M tokens and an 18B-parameter ensemble can match a 1B-token baseline, and Hacker News immediately focused on whether that gain survives serving and deployment.
SkyPilot says Claude Code ran about 910 autoresearch experiments in 8 hours, and Hacker News focused on whether the real breakthrough was agent strategy, infrastructure, or both.
Google introduced AI Works for Europe, adding $30 million to the Google.org European AI Opportunity Fund and expanding AI training resources. The initiative combines worker training, university partnerships, and a new certificate rollout in ten European languages.
A March 15, 2026 r/MachineLearning post introduced preflight, a lightweight PyTorch validator that reached 56 points and 13 comments by promising a fast pre-training gate for label leakage, NaNs, channel order, dead gradients, class imbalance, and VRAM risk.
A March 15, 2026 r/MachineLearning post introduced preflight, a new PyTorch-oriented CLI that runs 10 pre-training checks such as label leakage, NaN detection, gradient checks, and VRAM estimation before a job starts.
A Hacker News submission highlighted Andrej Karpathy's Autoresearch repo, a minimal setup where an AI agent edits one training file, runs fixed 5-minute experiments, and keeps only changes that improve `val_bpb`.
OpenAI CEO Sam Altman responded to criticism over AI training energy costs by drawing a parallel to human education: becoming intelligent also requires 20 years and all the food energy consumed in that time.