A LocalLLaMA Experiment Put a Tiny LLM on a 1998 iMac G3 with 32 MB of RAM

A high-signal LocalLLaMA post this week described an experiment that sounds absurd until you read the implementation notes: a stock 1998 iMac G3 with 32 MB of RAM running a local language model. The project ports Karpathy's llama2.c approach to classic Mac OS and targets the original Bondi Blue iMac without hardware upgrades.

The model choice is what makes the trick possible. Instead of forcing a contemporary checkpoint onto vintage hardware, the author uses the 260K-parameter TinyStories model with a roughly 1 MB checkpoint and runs it entirely in local memory. According to the README, the app reads a prompt from prompt.txt, tokenizes it with a 512-token BPE vocabulary, executes the transformer forward pass, and writes the continuation to output.txt. On the 233 MHz PowerPC G3, 32 generated tokens take less than a second.

The technical details are more interesting than the novelty headline. Because the PowerPC CPU is big-endian, the model and tokenizer files must be byte-swapped before use. Mac OS 8.5 also gives applications a tiny default memory partition, so the project expands heap space with MaxApplZone(), allocates through NewPtr(), and relies on static buffers to avoid malloc failures. The author also had to cap sequence length from 512 to 32 and fix a grouped-query attention weight-layout bug so later tensors would not point at the wrong memory addresses.

Why this experiment matters

This is not about useful throughput or modern reasoning quality. It is a sharp illustration of how small language models can travel when the software stack is stripped down to essentials. The repo documents cross-compilation with Retro68, endian conversion, file transfer over FTP, and even the lack of a usable console on Mac OS 8.5, which forced all debugging into text files.

Hardware: 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5.
Model: TinyStories 260K, Llama 2 architecture, about 1 MB checkpoint.
Main lesson: tiny checkpoints and careful systems work can push local inference far beyond what modern expectations suggest.

In other words, the post is less a stunt than a compact history lesson: the minimum viable LLM is much smaller than the models dominating current product conversations.

A LocalLLaMA Experiment Put a Tiny LLM on a 1998 iMac G3 with 32 MB of RAM

Why this experiment matters

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec

TextGen Becomes a Native Desktop App: Open-Source LM Studio Alternative Evolves

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec
LLM Reddit May 12, 2026 1 min read

TextGen Becomes a Native Desktop App: Open-Source LM Studio Alternative Evolves
LLM Reddit May 14, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read