A LocalLLaMA Benchmark Suggests MoE Models Fit 32 GB Apple Laptops Well

A recent LocalLLaMA discussion shared results from Mac LLM Bench, an open repository that tries to make Apple Silicon local-LLM performance easier to compare. The author benchmarked 37 models across 10 families on a 32 GB MacBook Air M5 using llama-bench with Q4_K_M quantization and published both the numbers and the scripts behind them.

The headline finding is not that one model wins universally, but that mixture-of-experts models appear to be a particularly strong fit for 32 GB laptops. In the posted results, Qwen 3.5 35B-A3B MoE reached 31.3 tokens per second on tg128 while using about 20.7 GB of RAM, whereas dense 32B-class models clustered near 2.5 tokens per second with roughly 18.6 to 18.7 GB of memory use. Smaller models naturally ran much faster, with Qwen 3 0.6B at 91.9 tok/s and Llama 3.2 1B at 59.4 tok/s, but the interesting comparison is the balance between interactivity and capability in the mid-to-large range.

The repository is built to be reproducible rather than anecdotal. It supports both GGUF benchmarks through llama.cpp and optional MLX benchmarks through mlx_lm.benchmark, stores fixed-token metrics such as pp128, pp256, pp512, tg128, and tg256, and organizes results by chip generation and hardware configuration. At the time of the post, the M5 section included 41 benchmarks when GGUF and MLX runs were combined.

What developers should take from it

The most useful point in the LocalLLaMA post is practical: a 32 GB Apple laptop has a clear wall for dense 32B models, and MoE designs can sometimes deliver a better latency-to-capability tradeoff. That does not make the published numbers universal, because runtime choice, quantization, thermal conditions, and prompt shape all matter. But it does provide a community-maintained starting point for hardware planning.

Focus machine in this result set: MacBook Air M5 with 32 GB RAM.
Primary benchmark tool: llama-bench, with separate support for MLX runs.
Project goal: a cross-generation benchmark database for M1 through M5 systems.

For local-LLM users, the value is not just one leaderboard screenshot. It is the emergence of a repeatable, open benchmark workflow that others can extend with their own machines and model choices.

A LocalLLaMA Benchmark Suggests MoE Models Fit 32 GB Apple Laptops Well

What developers should take from it

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec
LLM Reddit May 12, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read

LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds
LLM Reddit Feb 26, 2026 2 min read