LLM Feb 27, 2026 2 min read
OpenAI and Paradigm launched EVMbench, a benchmark for AI agent performance on smart contract detection, patching, and exploitation tasks. OpenAI reports GPT-5.3-Codex scored 72.2% in exploit mode versus 31.9% for GPT-5.