OpenAI Developers has updated its GPT-5.4 API prompting guide. The new guidance focuses on tool use, structured outputs, verification loops, and long-running workflows for production-grade agents.
#agents
RSS FeedAnthropic introduced Claude Sonnet 4.6 on February 17, 2026, adding a beta 1M token context window while keeping API pricing at $3/$15 per million tokens. The company says the new default model improves coding, computer use, and long-context reasoning enough to cover more work that previously pushed users toward Opus-class models.
A Hacker News submission highlighted Andrej Karpathy's Autoresearch repo, a minimal setup where an AI agent edits one training file, runs fixed 5-minute experiments, and keeps only changes that improve `val_bpb`.
Amazon and OpenAI have announced a multi-year strategic partnership that combines cloud infrastructure, managed GPT access in Bedrock, and a new stateful runtime for agent systems. The deal also significantly expands the companies’ long-term infrastructure commitment.
OpenAI and Amazon said AWS customers will get a Stateful Runtime Environment in Amazon Bedrock for production-grade agent workflows. The announcement moves agent execution closer to managed AWS infrastructure with persistent state, governance, and long-running workflow support.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
OpenAI announced an Operator upgrade adding Google Drive slides creation/editing and Jupyter-mode code execution in Browser. It also said Operator availability expanded to 20 additional regions in recent weeks, with new country additions including Korea and several European markets.
Anthropic launched Claude Cowork plugins that embed Claude natively into Microsoft Excel, PowerPoint, Slack, Gmail, and Google Drive—enabling autonomous cross-app workflows for enterprise users.
Google DeepMind announced SIMA 2 on November 13, 2025 as a generalist foundation model for virtual 3D environments. The system is designed to play and reason alongside humans, with in-context learning that can improve behavior from examples.
A high-scoring Hacker News post spotlights FDM-1, a video-native computer action model trained on an 11-million-hour dataset. The release emphasizes automatic action labeling with IDM and large-scale forking-VM evaluation for long-horizon interaction tasks.
Google announced on 2026-02-25 that Gemini in Android will begin handling multi-step tasks in beta. The rollout starts on Pixel 10 devices and Samsung Galaxy S26 series, initially in the U.S. and Korea.