r/LocalLLaMA Spots tinyforge for Local Self-Improvement in a 0.8B Model
Original: Ran an experiment: 0.8B model teaching itself on a MacBook Air with 6GB RAM. Some findings that surprised me. View original →
Why LocalLLaMA paid attention
The r/LocalLLaMA thread highlights a compact experiment rather than a giant model release. The setup is straightforward: run a 4-bit Qwen 3.5 0.8B model on a MacBook Air, let it solve coding tasks, execute tests, then feed back the exact failure information including the input, expected answer, and actual output. The author also uses a small evolutionary-search style loop that samples multiple attempts, keeps the stronger candidates, and then turns broken-to-fixed pairs into LoRA training data. The appeal is obvious for this community: no teacher model, no cloud API, and no expensive lab infrastructure.
The numbers in the post and tinyforge README are what pushed the experiment beyond novelty. On a fresh holdout slice, the author reports single-pass performance improving from 16/50 to 28/50 using only 13 self-generated repair pairs and about three minutes of training. Another holdout slice shows a feedback-loop gain from 42/58 to 47/58. The README frames the project around a 6GB RAM target, while the Reddit post says training peaked around 10GB. Even with that caveat, the hardware story is still unusually accessible by local-LLM standards.
The more interesting claim
The author says the model did not become dramatically better at cold coding after training. Instead, it became better at using failure feedback inside the repair loop. That distinction matters. It suggests tiny models may benefit less from memorizing solutions and more from learning a procedure for fixing mistakes when verification is available. If that generalizes, the same pattern could matter for code, SQL, mathematical proofs, or data transformations where an automatic checker exists.
- The training signal comes from self-generated repair pairs, not human labels.
- The strongest observed gain is in feedback-aware repair, not just one-shot generation.
- The experiment is still small and needs broader replication before strong conclusions.
That is why the post landed well on LocalLLaMA. It makes the case that small local models may still learn useful behavior if the loop around them is designed carefully enough.
Related Articles
Running Nvidia PersonaPlex 7B in Swift on Apple Silicon moves local voice agents closer to real time
An HN post on a Swift/MLX port of Nvidia PersonaPlex 7B shows how chunking, buffering, and interrupt handling matter as much as raw model quality for local speech-to-speech agents.
OpenAI announced an Operator upgrade adding Google Drive slides creation/editing and Jupyter-mode code execution in Browser. It also said Operator availability expanded to 20 additional regions in recent weeks, with new country additions including Korea and several European markets.
OpenAI says GPT-5.4 Thinking is shipping in ChatGPT, with GPT-5.4 also live in the API and Codex and GPT-5.4 Pro available for harder tasks. The launch packages reasoning, coding, and native computer use into a single professional-work model with up to 1M tokens of context.
Comments (0)
No comments yet. Be the first to comment!