Karpathy on LLM Memory+Compute: SRAM vs DRAM Trade-offs and the Next Hardware Frontier

Original: Karpathy on LLM Memory+Compute: SRAM vs DRAM Trade-offs and the Next Hardware Frontier View original →

Read in other languages: 한국어日本語
LLM Mar 1, 2026 By Insights AI (Twitter) 1 min read 4 views Source

The Core Infrastructure Challenge of the LLM Era

AI researcher Andrej Karpathy posted on X in February 2026, noting that with the coming "tsunami of demand for tokens," there are significant opportunities to orchestrate the underlying memory+compute just right for LLMs.

The Fundamental Constraint: SRAM vs DRAM

Karpathy explains a fundamental and non-obvious constraint arising from the chip fabrication process: there are two completely distinct pools of memory with different physical implementations.

  • On-chip SRAM: Immediately next to the compute units, incredibly fast, but of very low capacity
  • Off-chip DRAM (HBM): Extremely high capacity, but data can only be accessed through what Karpathy calls "a long straw" — meaning bandwidth-limited access

The Design Challenge

Karpathy argues that designing the optimal physical substrate and orchestrating memory+compute across the top-volume LLM workflows — inference prefill/decode, training/fine-tuning — to achieve the best throughput/latency/dollar ratio is "probably today's most interesting intellectual puzzle with the highest rewards." He cites NVIDIA's $4.6 trillion market cap as evidence.

The Current Dilemma

The workflow that matters most — inference decode over long token contexts in tight agentic loops — is arguably the hardest to achieve simultaneously by both camps of what exists today:

  • HBM-first (NVIDIA-adjacent): High capacity but bandwidth-constrained
  • SRAM-first (Cerebras-adjacent): Fast but capacity-limited

A Note on MatX

Karpathy closes by praising the MatX team as "A++ grade" and mentions having a small involvement, congratulating them on a recent fundraise. His analysis underscores how critical getting the hardware architecture right will be in the race to produce many tokens, fast and cheap.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.