Reddit Debates a 1.088B Spiking Language Model Trained From Scratch
Original: I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R] View original →
Why Reddit paid attention
This r/MachineLearning post did not read like a polished paper launch. It read more like a rough lab notebook from someone pushing an expensive experiment until the money ran out. That is exactly why it drew interest. The author describes themself as an 18-year-old indie developer trying to see whether a pure spiking neural network language model could be trained directly in the spike domain at billion-parameter scale instead of relying on distillation or ANN-to-SNN conversion. At crawl time, the thread had 102 points and 51 comments. The response was notably mixed in a productive way: readers were intrigued by the scale and the willingness to publish code and checkpoints, but several immediately asked for more comparable evaluation metrics and warned that the text quality still appears well below modern transformer baselines.
What the project claims
In the Reddit post and the linked Project Nord repository, the author claims to have trained a 1.088B-parameter pure SNN language model from random initialization using FineWeb-Edu and OpenHermes, with no pretrained teacher and no ANN-to-SNN conversion step. The reported headline numbers are 93% sparsity and a loss of 4.4 at 27K training steps. The repo argues that this contradicts the prevailing assumption that direct large-scale spike-domain training for language modeling is effectively intractable. It also describes a larger “Genesis Memory” design, spike-driven routing, and a shift toward heavier persistent-memory usage as the architecture scales.
Why the result is interesting even with caveats
The reason people took this seriously is not that the post proves a new state of the art. It is that the experiment lands directly on a known fault line in SNN research. The repo explicitly positions Nord against work such as SpikeBERT, SpikingBERT, and SpikeLLM, where distillation, conversion, or hybrid methods often appear because pure from-scratch training is difficult to stabilize. If Nord’s self-reported results hold up, they suggest that spike-domain training may scale further than many researchers currently assume. The sparsity angle also matters. A system where only about 7% of neurons fire per token changes the conversation from “can this match transformer fluency today?” to “what other compute and memory tradeoffs become possible if this line improves?”
Why the community stayed cautious
The caution in the thread is just as important as the excitement. The author openly says generation quality is still “janky,” and commenters immediately asked what loss 4.4 means in more comparable terms. The project is repo-backed and unusually detailed for a community post, but it is still self-reported work, not a peer-reviewed result or a benchmark leader. That is the right frame for this story. Reddit was not celebrating a finished product. It was paying attention to an unusually concrete attempt to push pure SNN language modeling into a scale range where the literature often expects failure, then publishing enough artifacts for others to inspect the claim seriously.
Sources: Project Nord GitHub · Reddit discussion
Related Articles
A research-oriented post on r/MachineLearning claimed that a pure spiking neural network language model could reach 1.088B parameters from random initialization before budget limits ended the run.
A LocalLLaMA thread highlighted Hugging Face's decision to move Safetensors under the PyTorch Foundation, keeping compatibility intact while shifting governance to a neutral home.
A popular Reddit post pushed MemPalace into the main AI feed, but the repo’s own correction note became the more interesting part: 96.6% is the raw offline score, while 100% depends on optional reranking.
Comments (0)
No comments yet. Be the first to comment!