r/LocalLLaMA Spots tinyforge for Local Self-Improvement in a 0.8B Model
Original: Ran an experiment: 0.8B model teaching itself on a MacBook Air with 6GB RAM. Some findings that surprised me. View original →
Why LocalLLaMA paid attention
The r/LocalLLaMA thread highlights a compact experiment rather than a giant model release. The setup is straightforward: run a 4-bit Qwen 3.5 0.8B model on a MacBook Air, let it solve coding tasks, execute tests, then feed back the exact failure information including the input, expected answer, and actual output. The author also uses a small evolutionary-search style loop that samples multiple attempts, keeps the stronger candidates, and then turns broken-to-fixed pairs into LoRA training data. The appeal is obvious for this community: no teacher model, no cloud API, and no expensive lab infrastructure.
The numbers in the post and tinyforge README are what pushed the experiment beyond novelty. On a fresh holdout slice, the author reports single-pass performance improving from 16/50 to 28/50 using only 13 self-generated repair pairs and about three minutes of training. Another holdout slice shows a feedback-loop gain from 42/58 to 47/58. The README frames the project around a 6GB RAM target, while the Reddit post says training peaked around 10GB. Even with that caveat, the hardware story is still unusually accessible by local-LLM standards.
The more interesting claim
The author says the model did not become dramatically better at cold coding after training. Instead, it became better at using failure feedback inside the repair loop. That distinction matters. It suggests tiny models may benefit less from memorizing solutions and more from learning a procedure for fixing mistakes when verification is available. If that generalizes, the same pattern could matter for code, SQL, mathematical proofs, or data transformations where an automatic checker exists.
- The training signal comes from self-generated repair pairs, not human labels.
- The strongest observed gain is in feedback-aware repair, not just one-shot generation.
- The experiment is still small and needs broader replication before strong conclusions.
That is why the post landed well on LocalLLaMA. It makes the case that small local models may still learn useful behavior if the loop around them is designed carefully enough.
Related Articles
A March 28, 2026 r/LocalLLaMA post turned TurboQuant from a paper topic into an MLX implementation story with custom Metal kernels, code, and an upstream PR. The author reports 4.6x KV cache compression at 0.98x FP16 speed on Qwen2.5-32B, but the repository's 7B README numbers are more conservative, underscoring how model choice and integration details shape the real payoff.
Ollama used a March 30, 2026 preview to move its Apple Silicon path onto MLX. The release pairs higher prefill and decode throughput with NVFP4 support and cache changes aimed at coding and agent workflows.
A March 31, 2026 Hacker News hit brought attention to Ollama’s new MLX-based Apple Silicon runtime. The announcement combines MLX, NVFP4, and upgraded cache behavior to make local coding-agent workloads on macOS more practical.
Comments (0)
No comments yet. Be the first to comment!