Reverse Engineered Apple Neural Engine to Train Microgpt
Original: Reverse engineered Apple Neural Engine(ANE) to train Microgpt View original →
Why the Apple Neural Engine?
Apple's M4 chip Neural Engine (ANE) offers 38 TFLOPS of claimed INT8 compute — though since it's a FP16 processor, actual compute is roughly half that. Despite this capability, Apple provides no public API for direct ANE access. CoreML is the official recommended path, but it abstracts away from direct hardware utilization.
This developer, wanting to maximize the compute in their Mac Mini M4, used Claude to reverse engineer the ANE's private APIs, bypassing CoreML to access the hardware directly. The post earned 457 upvotes on r/LocalLLaMA.
The Reverse Engineering Process
Using Claude as an engineering partner, the developer analyzed Apple's private ANE APIs, ran benchmarks by bypassing CoreML, and built a bespoke training pipeline. The result: a successfully trained 110M parameter Microgpt model running entirely on the ANE.
Results and Limitations
- Success: Completed training a 110M Microgpt model on a single M4 ANE
- Limitation: A single chip is not practical for training larger models
- Future potential: A cluster of ANE-equipped Apple Silicon devices could theoretically train larger models; even a single device should handle LoRA fine-tuning for 3B/7B models
Why NPU Training Matters
NPUs offer dramatically better power efficiency than GPUs for matrix multiplication workloads. Apple Silicon ANEs process vastly more operations per watt than discrete GPUs. This project demonstrates a potential path toward democratizing AI training — using the NPU in MacBooks and Mac Minis rather than expensive NVIDIA hardware. It also highlights Claude's utility as a reverse engineering assistant for systems-level work.
Related Articles
A Show HN post about Apfel cleared 513 points and 117 comments during this April 4, 2026 crawl, highlighting a Swift tool that turns Apple's on-device foundation model into a CLI, chat interface, and OpenAI-compatible local server on Apple Silicon.
Lemonade packages local AI inference behind an OpenAI-compatible server that targets GPUs and NPUs, aiming to make open models easier to deploy on everyday PCs.
r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.
Comments (0)
No comments yet. Be the first to comment!