Karpathy on LLM Memory+Compute: SRAM vs DRAM Trade-offs and the Next Hardware Frontier

The Core Infrastructure Challenge of the LLM Era

AI researcher Andrej Karpathy posted on X in February 2026, noting that with the coming "tsunami of demand for tokens," there are significant opportunities to orchestrate the underlying memory+compute just right for LLMs.

The Fundamental Constraint: SRAM vs DRAM

Karpathy explains a fundamental and non-obvious constraint arising from the chip fabrication process: there are two completely distinct pools of memory with different physical implementations.

On-chip SRAM: Immediately next to the compute units, incredibly fast, but of very low capacity
Off-chip DRAM (HBM): Extremely high capacity, but data can only be accessed through what Karpathy calls "a long straw" — meaning bandwidth-limited access

The Design Challenge

Karpathy argues that designing the optimal physical substrate and orchestrating memory+compute across the top-volume LLM workflows — inference prefill/decode, training/fine-tuning — to achieve the best throughput/latency/dollar ratio is "probably today's most interesting intellectual puzzle with the highest rewards." He cites NVIDIA's $4.6 trillion market cap as evidence.

The Current Dilemma

The workflow that matters most — inference decode over long token contexts in tight agentic loops — is arguably the hardest to achieve simultaneously by both camps of what exists today:

HBM-first (NVIDIA-adjacent): High capacity but bandwidth-constrained
SRAM-first (Cerebras-adjacent): Fast but capacity-limited

A Note on MatX

Karpathy closes by praising the MatX team as "A++ grade" and mentions having a small involvement, congratulating them on a recent fundraise. His analysis underscores how critical getting the hardware architecture right will be in the race to produce many tokens, fast and cheap.

Karpathy on LLM Memory+Compute: SRAM vs DRAM Trade-offs and the Next Hardware Frontier

The Core Infrastructure Challenge of the LLM Era

The Fundamental Constraint: SRAM vs DRAM

The Design Challenge

The Current Dilemma

A Note on MatX

Related Articles

Taalas Claims to Bake Entire LLMs Into Silicon for 17K Tokens/Second

Taalas Prints LLM Weights into Silicon: 17,000 Tokens/sec at 10x Lower Cost

Δ-Mem: Compact Online Memory State Boosts LLM Long-Term Recall

Related Articles

Taalas Claims to Bake Entire LLMs Into Silicon for 17K Tokens/Second
LLM Reddit Feb 23, 2026 1 min read

Taalas Prints LLM Weights into Silicon: 17,000 Tokens/sec at 10x Lower Cost
LLM Hacker News Feb 22, 2026 2 min read

Δ-Mem: Compact Online Memory State Boosts LLM Long-Term Recall
LLM Hacker News May 16, 2026 1 min read