Reverse Engineered Apple Neural Engine to Train Microgpt

Why the Apple Neural Engine?

Apple's M4 chip Neural Engine (ANE) offers 38 TFLOPS of claimed INT8 compute — though since it's a FP16 processor, actual compute is roughly half that. Despite this capability, Apple provides no public API for direct ANE access. CoreML is the official recommended path, but it abstracts away from direct hardware utilization.

This developer, wanting to maximize the compute in their Mac Mini M4, used Claude to reverse engineer the ANE's private APIs, bypassing CoreML to access the hardware directly. The post earned 457 upvotes on r/LocalLLaMA.

The Reverse Engineering Process

Using Claude as an engineering partner, the developer analyzed Apple's private ANE APIs, ran benchmarks by bypassing CoreML, and built a bespoke training pipeline. The result: a successfully trained 110M parameter Microgpt model running entirely on the ANE.

Results and Limitations

Success: Completed training a 110M Microgpt model on a single M4 ANE
Limitation: A single chip is not practical for training larger models
Future potential: A cluster of ANE-equipped Apple Silicon devices could theoretically train larger models; even a single device should handle LoRA fine-tuning for 3B/7B models

Why NPU Training Matters

NPUs offer dramatically better power efficiency than GPUs for matrix multiplication workloads. Apple Silicon ANEs process vastly more operations per watt than discrete GPUs. This project demonstrates a potential path toward democratizing AI training — using the NPU in MacBooks and Mac Minis rather than expensive NVIDIA hardware. It also highlights Claude's utility as a reverse engineering assistant for systems-level work.

Reverse Engineered Apple Neural Engine to Train Microgpt

Why the Apple Neural Engine?

The Reverse Engineering Process

Results and Limitations

Why NPU Training Matters

Related Articles

Gemma 4 12B puts the spotlight on encoder-free multimodal local AI

Gemma 4 12B removes separate encoders for laptop-scale multimodal AI

Nemotron 3 Ultra uses 550B MoE design to cut agent costs by 30%