GPT-5.5 clears AISI’s cyber bar, and Reddit fixates on the $1.73 detail

The viral framing in Reddit’s title collapsed two different numbers into one neat line, but AISI’s April 30, 2026 official evaluation is more specific. One result is that GPT-5.5 became the second model to complete AISI’s multi-step TLO enterprise attack simulation end-to-end, doing so in 2 of 10 attempts on a task the institute estimates would take a human expert around 20 hours. The separate “10 minutes and $1.73” figure belongs to a difficult reverse-engineering challenge called rust_vm, which GPT-5.5 solved in 10 minutes and 22 seconds with no human assistance.

AISI’s setup is much closer to operational cyber work than to a single benchmark score. The institute says it uses 95 narrow tasks across four difficulty tiers covering areas such as reverse engineering, web exploitation, and cryptography. The TLO scenario starts from an unprivileged box and forces the model through reconnaissance, credential theft, lateral movement, a CI/CD supply-chain pivot, and database exfiltration. GPT-5.5’s end-to-end completions came at a 100M-token budget per attempt, and AISI says performance still scales with more inference compute.

That is why the Reddit reaction split in two interesting directions. One lane treated the post as evidence that cyber capability is no longer concentrated in a single standout model. The other focused on cost, reproducibility, and hype. The top comment framed the result as a hit to Anthropic’s “too dangerous to release” narrative around Mythos. Another commenter doubted that a real 11-minute compute run could cost only $1.73. The thread was less about whether frontier models are useful in cyber work and more about how quickly the economics are moving.

NCSC’s broader framing reinforces that reading. In its March 30, 2026 blog post on frontier AI and cyber defense, the agency argues defenders should assume at least some attackers already have access to capable AI tools, and should adopt the same capabilities defensively while tightening baseline security. The real signal here is not “AI replaces hackers today.” It is that expensive, specialist tasks are becoming faster and cheaper often enough that both defenders and policymakers have to treat this as a present-tense shift.

Source: AISI evaluation · NCSC context · Reddit discussion

GPT-5.5 clears AISI’s cyber bar, and Reddit fixates on the $1.73 detail

Related Articles

HN’s GPT-5.5 read: the real question is whether it finishes the job

GPT-5.5 pushes agentic coding higher without adding latency

Arena puts GPT-5.5 at #2 in search and +50 in Code Arena

Comments (0)

Leave a Comment

Related Articles

HN’s GPT-5.5 read: the real question is whether it finishes the job
LLM Hacker News Apr 24, 2026 2 min read

GPT-5.5 pushes agentic coding higher without adding latency

Arena puts GPT-5.5 at #2 in search and +50 in Code Arena