Reverse Engineered Apple Neural Engine to Train Microgpt

Why the Apple Neural Engine?

Apple's M4 chip Neural Engine (ANE) offers 38 TFLOPS of claimed INT8 compute — though since it's a FP16 processor, actual compute is roughly half that. Despite this capability, Apple provides no public API for direct ANE access. CoreML is the official recommended path, but it abstracts away from direct hardware utilization.

This developer, wanting to maximize the compute in their Mac Mini M4, used Claude to reverse engineer the ANE's private APIs, bypassing CoreML to access the hardware directly. The post earned 457 upvotes on r/LocalLLaMA.

The Reverse Engineering Process

Using Claude as an engineering partner, the developer analyzed Apple's private ANE APIs, ran benchmarks by bypassing CoreML, and built a bespoke training pipeline. The result: a successfully trained 110M parameter Microgpt model running entirely on the ANE.

Results and Limitations

Success: Completed training a 110M Microgpt model on a single M4 ANE
Limitation: A single chip is not practical for training larger models
Future potential: A cluster of ANE-equipped Apple Silicon devices could theoretically train larger models; even a single device should handle LoRA fine-tuning for 3B/7B models

Why NPU Training Matters

NPUs offer dramatically better power efficiency than GPUs for matrix multiplication workloads. Apple Silicon ANEs process vastly more operations per watt than discrete GPUs. This project demonstrates a potential path toward democratizing AI training — using the NPU in MacBooks and Mac Minis rather than expensive NVIDIA hardware. It also highlights Claude's utility as a reverse engineering assistant for systems-level work.

Reverse Engineered Apple Neural Engine to Train Microgpt

Why the Apple Neural Engine?

The Reverse Engineering Process

Results and Limitations

Why NPU Training Matters

Related Articles

OpenInterpreter brings a Rust Kimi K3 harness to coding agents

Kimi K3 beats GPT-5.6 on cost in a private cyber eval

Bonsai cuts a 27B model to 3.9GB for mobile inference

Related Articles

OpenInterpreter brings a Rust Kimi K3 harness to coding agents
LLM X/Twitter Jul 19, 2026 1 min read

Kimi K3 beats GPT-5.6 on cost in a private cyber eval
LLM X/Twitter Jul 19, 2026 1 min read

Bonsai cuts a 27B model to 3.9GB for mobile inference
LLM X/Twitter Jul 19, 2026 1 min read