Hacker News examines Percepta's claim that transformers can execute programs internally
Original: Executing programs inside transformers with exponentially faster inference View original →
One of the more provocative AI links on Hacker News was Percepta's March 11, 2026 post Can LLMs Be Computers? The public page makes a bold claim in just a few lines: the team says it built a computer inside a transformer that can execute arbitrary C programs for millions of steps, with exponentially faster inference via 2D attention heads. Even in teaser form, that is enough to trigger a familiar HN reaction: intense curiosity followed immediately by demands for harder evidence.
The claim matters because it targets a boundary that current LLM systems still treat as external. A lot of modern agent systems generate code or tool calls and then wait for another runtime to execute them. Percepta is framing its work differently. According to the post description, execution itself is being carried inside the transformer rather than delegated outside it. That is a much stronger statement than ordinary tool use, because it suggests a model architecture can become a computational substrate instead of only a planner wrapped around other software.
HN readers quickly connected the idea to two long-running research questions. The first is interpretability: if some behavior can be represented in a more program-like or pseudo-symbolic form, it may become easier to inspect than opaque end-to-end heuristics. The second is reasoning efficiency: several commenters read the post as evidence that next-token systems may be able to perform structured computation much more directly than today's tool-augmented stacks suggest. A few even speculated about combining this sort of mechanism with reinforcement learning or stronger planning loops.
But the enthusiasm came with obvious skepticism. Multiple readers said the write-up felt more like a teaser than a full explanation and asked for concrete benchmarks, practical examples, and a cleaner explanation of what the speedup actually measures. Others said the idea sounded brilliant but hard to evaluate from the public material alone. That criticism is fair. When a research claim is this ambitious, clarity and measurement matter as much as novelty.
So the HN thread is less a verdict than a marker. Percepta has put a high-upside research direction on the table: maybe transformers are not only sequence predictors, but can also serve as efficient internal executors for certain classes of computation. Whether that becomes a serious architectural shift will depend on the next step, which is not a sharper slogan but reproducible tasks, clearer exposition, and benchmarks the wider research community can test. Original source: Percepta. Community discussion: Hacker News.
Related Articles
Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.
NVIDIA says its GB300 NVL72 delivered up to 20x more concurrent agentic coding capacity per megawatt than H200 on Artificial Analysis’ new AA-AgentPerf benchmark. The test measures concurrent AI agents under service-level objectives, not just raw token throughput.
The LocalLLaMA angle is not just the 1000+ tps headline, but whether FP4, DFlash, and commodity GPU kernels can be reproduced outside Xiaomi’s hosted trial.