LocalLLaMA Spotlight: 144M Spiking Neural Network LM trained from scratch
Original: Training a 144M Spiking Neural Network for text generation from scratch — no transformer teacher, no distillation View original →
What the Reddit post reported
A post in r/LocalLLaMA (score 154, 32 comments at collection time) described a 144M-parameter Spiking Neural Network language model called Nord, trained from scratch on FineWeb-Edu. The author explicitly stated the architecture is not based on Transformers, RWKV, or existing SNN templates, and estimated early training cost around $10 on a rented A5000 setup.
The central claim is that sparsity emerges naturally: during inference, only about 2-3% of neurons fire per token, which the author summarizes as 97-98% sparsity without adding a dedicated sparsity loss term. The post frames this as a potential path to more efficient or interpretable language modeling behavior.
Technical details highlighted
- Topic coherence observation: The author compared Nord with GPT-2 Small (124M) on selected prompts and said Nord stayed more on-topic in those examples.
- Visibility into processing: Spike-rate analysis in the post reports Block 4 at 9.8% activity versus Block 0 at 0.6%, interpreted as rough stage separation between filtering and heavier processing.
- Online learning mechanism: The system includes STDP (Spike-Timing Dependent Plasticity) updates during conversation, presented as a biologically inspired adaptation path.
- Architecture components: The author lists LeakyClamp, Associative Cascade, Multi-scale temporal encoding, Temporal Co-firing Resonance, and Reward-modulated STDP.
Limitations the author openly disclosed
The post is careful about its current constraints. Reported loss is still 4.5, with a stated target range of 3.8-4.0 after larger training volume (40GB). It also says text fluency remains below GPT-2 and that the GPT-2 comparison is based on limited prompts, not a formal benchmark suite. In other words, this is a promising exploratory build, not a validated replacement for mainstream open LMs.
Community feedback pattern
Top comments mostly converged on methodology questions: hardware demand, exact training cost math, benchmarking rigor, and implementation details in the released code. A few commenters called the experiment interesting, but also requested stronger, reproducible evaluation before drawing broad conclusions.
Why this matters for practitioners
Even with caveats, the post is notable because it ships code and model artifacts openly, allowing replication attempts instead of pure speculation. For teams tracking alternatives to dense Transformer inference, SNN-style sparsity and temporal learning rules remain an active frontier. The next meaningful step is standardized measurement: perplexity, long-context behavior, throughput, and energy-per-token under comparable hardware constraints.
Links included by the author: GitHub code at https://github.com/gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model and model weights at https://huggingface.co/zerdovzad/Nord-AI.
Related Articles
Semble is an open-source code search library for AI agents that reduces token usage by 98% compared to grep+read, while achieving 99% of transformer model quality. It runs entirely on CPU with no external dependencies and integrates directly with Claude Code, Cursor, and Codex via MCP.
Alibaba's Qwen team has released Qwen3.7-Max, an agent-focused frontier LLM. It ranks 5th on Artificial Analysis's Intelligence Index, nearly matching GPT 5.4, and is available as both an API and open weights.
Forge is a new open-source Python framework that applies structured guardrails to self-hosted LLMs. The best config — Ministral-3 8B Q8 — jumps from a 53% baseline to 86.5% on the 26-scenario eval suite, with 99% achievable on agentic tasks.