OpenAI introduced EVMbench, a new benchmark measuring how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities in EVM-based blockchains.
#ai-agents
RSS FeedSecurityScorecard's STRIKE team found 40,214 OpenClaw AI agent instances exposed to the public internet with no authentication. Over 12,000 are vulnerable to Remote Code Execution, and attackers who compromise them inherit full system access including SSH keys, browser sessions, and filesystem control.
ByteDance released Doubao 2.0 ahead of Lunar New Year, claiming GPT-5.2 and Gemini 3 Pro parity with 98.3 on AIME 2025, a 3020 Codeforces rating, and pricing 10x cheaper than Western rivals.
Claude Opus 4.6 achieved a 50%-time-horizon of approximately 14.5 hours on METR's software task benchmark — beating all predictions and suggesting a doubling time of under 3 months for AI task capabilities.
Andrej Karpathy coined a new term for OpenClaw-like AI agent systems: "Claws." Just as LLM agents were a new layer on top of LLMs, Claws provide orchestration, scheduling, persistent context, and tool calls on top of LLM agents.
A high-signal Hacker News thread highlighted Anthropic's February 18, 2026 analysis of millions of agent interactions. The report tracks growing practical autonomy, evolving human oversight behavior, and early but rising higher-risk usage patterns.
A Reddit r/singularity post surfaced Anthropic's February 18, 2026 research on real-world agent autonomy, including findings on longer autonomous runs, rising auto-approve behavior among experienced users, and risk distribution across domains.
A Docker guide on running NanoClaw inside a Shell Sandbox reached 102 points on Hacker News, highlighting a practical pattern for isolating agent runtime, limiting filesystem exposure, and keeping API keys out of the guest environment.
Anthropic announced on February 2, 2026 that it is partnering with the Allen Institute and Howard Hughes Medical Institute (HHMI) on AI-enabled life-science workflows. The stated goal is to reduce analysis bottlenecks and improve transparent, interpretable scientific reasoning.
Anthropic announced on January 28, 2026 that ServiceNow selected Claude as its default model for AI agent development. ServiceNow cited up to 95% productivity gains in some workflows and reported large-scale AI request volumes.
Google DeepMind published new results on February 11, 2026 showing Gemini Deep Think workflows for mathematics, physics, and computer science research. The post outlines two new papers, evaluation benchmarks, and agent-assisted verification methods.