Inside Apple's M4 Neural Engine: Reverse Engineering Reveals Graph Execution Architecture
Original: Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering View original →
Reverse Engineering the M4 Neural Engine
A detailed reverse engineering investigation of Apple's M4 Neural Engine (codename H16G) has uncovered fundamental architectural insights that challenge common assumptions about Apple's AI hardware. The research garnered significant attention on Hacker News, reflecting the AI community's deep interest in understanding these increasingly important chips.
A Graph Execution Engine, Not a Traditional Processor
The most significant finding: the M4 ANE is not a traditional GPU or CPU. It's a graph execution engine — rather than processing individual instructions, it accepts pre-compiled neural network graphs and executes them atomically. The system features 16 cores, a queue depth supporting 127 simultaneous evaluation requests, independent dynamic voltage/frequency scaling, and power gating that reduces consumption to zero when idle.
Hidden APIs Bypassing CoreML
A major breakthrough was discovering that CoreML is not the only access path to the ANE. The private _ANEClient class in AppleNeuralEngine.framework provides direct compilation, loading, and evaluation capabilities. Researchers identified over 40 undocumented private classes and implemented in-memory compilation using _ANEInMemoryModelDescriptor, which accepts MIL (Machine Learning Intermediate Language) text directly without filesystem round-trips — critical for training applications.
Apple's '38 TOPS' Claim Is Misleading
Testing revealed that Apple's published 38 TOPS specification is misleading. Expressing matrix multiplication as 1x1 convolution achieves significantly higher throughput than native matmul operations — suggesting convolution is the ANE's primary compute primitive. The E5 binary format also revealed something unexpected: the compiled output describes parameterized compute primitive configurations rather than traditional machine code.
Unexplored Territory
Several discovered classes hint at untapped capabilities including model chaining support, GPU-ANE synchronization primitives, and potentially accessible hardware performance counters — promising areas for future investigation.
Related Articles
A researcher published a reverse engineering analysis of the Apple M4 chip's Neural Engine, revealing its CoreML-based architecture, 6.6 FLOPS/W energy efficiency, and the ability to completely shut down when idle.
Apple launched new MacBook Pro models featuring the M5 Pro and M5 Max chips, delivering up to 4x AI performance improvement over the previous generation. The M5 Max packs 614GB/s memory bandwidth and Neural Accelerators built into every GPU core.
Apple released the first iOS 26.4 developer beta on February 17 without the expected Google Gemini-powered Siri improvements, citing response latency and query processing issues during internal testing.
Comments (0)
No comments yet. Be the first to comment!