#training

AI Reddit 1d ago 2 min read

r/MachineLearning Didn't Buy the Hype Around Rose, but It Did Find the Idea Interesting

The post promised a zero-state optimizer with low VRAM overhead, and r/MachineLearning answered the way that community usually does: show the rule, show more seeds, and bring harder tasks.

#optimizer #pytorch #training

AI sources.twitter 1d ago 2 min read

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

Why it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.

#lmsys #deepseek #benchmarks

LLM Apr 11, 2026 2 min read

GitHub will use more Copilot interaction data for model training by default

GitHub said that starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve AI models unless users opt out. Business and Enterprise plans are excluded, but the change materially expands how individual-tier Copilot usage can feed back into model development.

#github #copilot #privacy

AI Reddit Mar 20, 2026 2 min read

r/MachineLearning Watches Clip to Grok Claim 18x-to-66x Faster Generalization

A March 17, 2026 r/MachineLearning post about Clip to Grok reached 56 points and 20 comments at crawl time. The authors report that per-row L2 clipping after each optimizer step cut grokking delay by 18x to 66x on modular arithmetic benchmarks.

#grokking #optimization #transformers

AI Hacker News Mar 20, 2026 2 min read

Hacker News Tracks NanoGPT Slowrun’s 10x Data-Efficiency Claim Under Fixed Data

A March 19, 2026 Hacker News post about NanoGPT Slowrun reached 162 points and 43 comments at crawl time. Q Labs says an ensemble of 1.8B-parameter models trained on 100M tokens matched a baseline that would normally require 1B tokens.

#language-models #data-efficiency #ensembles

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Examines NanoGPT Slowrun's 10x Data-Efficiency Claim

Q Labs says 100M tokens and an 18B-parameter ensemble can match a 1B-token baseline, and Hacker News immediately focused on whether that gain survives serving and deployment.

#llm #training #scaling-laws

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Debates What 16 GPUs Really Changed in Karpathy's Autoresearch

SkyPilot says Claude Code ran about 910 autoresearch experiments in 8 hours, and Hacker News focused on whether the real breakthrough was agent strategy, infrastructure, or both.

#llm #gpus #agents

AI Mar 17, 2026 2 min read

Google launches AI Works for Europe with $30 million and multilingual AI training

Google introduced AI Works for Europe, adding $30 million to the Google.org European AI Opportunity Fund and expanding AI training resources. The initiative combines worker training, university partnerships, and a new certificate rollout in ten European languages.

#google #europe #ai-skills

AI Reddit Mar 17, 2026 2 min read

r/MachineLearning: preflight Adds a 10-Check PyTorch Gate Before Training Starts

A March 15, 2026 r/MachineLearning post introduced preflight, a lightweight PyTorch validator that reached 56 points and 13 comments by promising a fast pre-training gate for label leakage, NaNs, channel order, dead gradients, class imbalance, and VRAM risk.

#pytorch #mlops #data-validation

AI Reddit Mar 16, 2026 2 min read

MachineLearning Post Introduces preflight, a PyTorch CLI for Catching Silent Training Failures Before GPUs Burn Time

A March 15, 2026 r/MachineLearning post introduced preflight, a new PyTorch-oriented CLI that runs 10 pre-training checks such as label leakage, NaN detection, gradient checks, and VRAM estimation before a job starts.

#pytorch #mlops #data-validation

LLM Hacker News Mar 8, 2026 2 min read

Autoresearch turns a single-GPU nanochat setup into an overnight agent loop

A Hacker News submission highlighted Andrej Karpathy's Autoresearch repo, a minimal setup where an AI agent edits one training file, runs fixed 5-minute experiments, and keeps only changes that improve `val_bpb`.

#autoresearch #agents #nanochat

AI Reddit Feb 23, 2026 1 min read

Sam Altman Compares AI Training Energy to the 20-Year Cost of Educating a Human

OpenAI CEO Sam Altman responded to criticism over AI training energy costs by drawing a parallel to human education: becoming intelligent also requires 20 years and all the food energy consumed in that time.

#openai #sam-altman #energy