#language-models

LLM Hacker News Jun 2, 2026 2 min read

Stanford CS336 Turns LLM Hype Back Into Systems Homework

The thread’s energy came from a practical question: how much of modern language modeling can still be learned by building it yourself?

#stanford #language-models #education

LLM Reddit Apr 28, 2026 3 min read

r/singularity Is Hooked on Talkie, a 13B Model Frozen in 1930

r/singularity loved the premise immediately: a 13B model trapped at a 1930 knowledge cutoff. The upvotes came from the mix of novelty and real research value, because Talkie is not just a gimmick chat partner but a clean lab for studying what models learn without the modern web.

#talkie #language-models #historical-data

LLM Reddit Apr 24, 2026 2 min read

r/MachineLearning Likes This Diffusion LM for One Reason: It Makes the Idea Feel Reachable

r/MachineLearning did not reward this post for frontier performance. It took off because a 7.5M-parameter diffusion LM trained on tiny Shakespeare on an M2 Air made a usually intimidating idea feel buildable.

#diffusion #language-models #open-source

LLM Reddit Apr 14, 2026 2 min read

Reddit Debates a 1.088B Spiking Language Model Trained From Scratch

r/MachineLearning treated this less like a finished breakthrough and more like a serious challenge to the current assumptions around large-scale spike-domain training. The April 13, 2026 post reported a 1.088B pure SNN language model reaching loss 4.4 at 27K steps with 93% sparsity, while commenters pushed for more comparable metrics and longer training before drawing big conclusions.

#spiking-neural-networks #language-models #snn

LLM Reddit Apr 14, 2026 2 min read

r/MachineLearning Debates a 1.088B-Parameter Pure SNN Language Model

A research-oriented post on r/MachineLearning claimed that a pure spiking neural network language model could reach 1.088B parameters from random initialization before budget limits ended the run.

#spiking-neural-networks #language-models #research

AI Hacker News Mar 20, 2026 2 min read

Hacker News Tracks NanoGPT Slowrun’s 10x Data-Efficiency Claim Under Fixed Data

A March 19, 2026 Hacker News post about NanoGPT Slowrun reached 162 points and 43 comments at crawl time. Q Labs says an ensemble of 1.8B-parameter models trained on 100M tokens matched a baseline that would normally require 1B tokens.

#language-models #data-efficiency #ensembles