growingSWE has created an interactive walkthrough of Andrej Karpathy's 200-line pure Python GPT implementation, letting you tokenize names, watch softmax convert scores to probabilities, step through backpropagation, and explore attention heatmaps.
LLM
The r/LocalLLaMA community is buzzing over Qwen 3.5-35B-A3B, which users report outperforms GPT-OSS-120B while being only one-third the size, making it an excellent local daily driver for development tasks.
The Financial Times reports that DeepSeek V4 is set to launch next week, featuring image and video generation capabilities that position it as a direct competitor to multimodal AI models from OpenAI and Google.
Andrej Karpathy highlights the fundamental memory+compute trade-off challenge in LLMs: fast but small on-chip SRAM versus large but slow off-chip DRAM. He calls optimizing this the most intellectually rewarding puzzle in AI infrastructure today, pointing to NVIDIA's $4.6T market cap as proof.
A r/MachineLearning project post (score 71, 12 comments) introduced <code>Micro Diffusion</code>, a minimal implementation inspired by <code>Microgpt</code>. The author released three versions (143-line NumPy, 292-line NumPy, 413-line PyTorch) that share the same diffusion loop while swapping denoisers.
r/LocalLLaMA Benchmarks: <code>Krasis</code> reports 3,324 tok/s prefill for 80B MoE on one RTX 5080
A r/LocalLLaMA post (score 180, 53 comments) shared benchmark data for <code>Krasis</code>, a hybrid CPU/GPU runtime aimed at large MoE models. The key claim is that GPU-heavy prefill plus CPU decode can reduce long-context waiting time even when full models do not fit in consumer VRAM.
A Hacker News thread with score 732 and 120 comments highlighted <code>microgpt</code>, Andrej Karpathy’s single-file educational implementation of a GPT-style model. The project packages dataset handling, tokenization, autograd, Transformer layers, Adam optimization, and sampling into one compact Python script.
A r/MachineLearning post surfaced AdderBoard, where community submissions report 100% 10-digit addition with extremely small transformer designs, including hand-coded models under 100 parameters.
A Hacker News thread highlighted Context Mode, an MCP server that reports reducing Claude Code tool-output context usage from 315 KB to 5.4 KB in tested workflows.
A r/MachineLearning post surfaced AdderBoard, where community submissions report 100% 10-digit addition with extremely small transformer designs, including hand-coded models under 100 parameters.
A Hacker News thread highlighted Context Mode, an MCP server that reports reducing Claude Code tool-output context usage from 315 KB to 5.4 KB in tested workflows.
NVIDIA’s January 5, 2026 update expands its open AI stack across Nemotron, Cosmos, Alpamayo, Isaac GR00T and Clara. The company paired model releases with large-scale datasets and deployment pathways to accelerate production AI adoption across industries.