Azure says GPT-5.4 is now available in Microsoft Foundry for production-grade agent workloads. Microsoft’s supporting post adds GPT-5.4 Pro, pricing, and initial deployment options, with governance controls positioned as part of the pitch.
#agents
RSS FeedOpenAI Developers has updated its GPT-5.4 API prompting guide. The new guidance focuses on tool use, structured outputs, verification loops, and long-running workflows for production-grade agents.
OpenAI and Amazon said AWS customers will get a Stateful Runtime Environment in Amazon Bedrock for production-grade agent workflows. The announcement moves agent execution closer to managed AWS infrastructure with persistent state, governance, and long-running workflow support.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
OpenAI announced an Operator upgrade adding Google Drive slides creation/editing and Jupyter-mode code execution in Browser. It also said Operator availability expanded to 20 additional regions in recent weeks, with new country additions including Korea and several European markets.
Anthropic launched Claude Cowork plugins that embed Claude natively into Microsoft Excel, PowerPoint, Slack, Gmail, and Google Drive—enabling autonomous cross-app workflows for enterprise users.
Google DeepMind announced SIMA 2 on November 13, 2025 as a generalist foundation model for virtual 3D environments. The system is designed to play and reason alongside humans, with in-context learning that can improve behavior from examples.
A high-scoring Hacker News post spotlights FDM-1, a video-native computer action model trained on an 11-million-hour dataset. The release emphasizes automatic action labeling with IDM and large-scale forking-VM evaluation for long-horizon interaction tasks.
Google announced on 2026-02-25 that Gemini in Android will begin handling multi-step tasks in beta. The rollout starts on Pixel 10 devices and Samsung Galaxy S26 series, initially in the U.S. and Korea.
Andrej Karpathy shared how he vibe-coded a custom health tracking dashboard in 1 hour, then argued that the traditional app store model is becoming obsolete as LLM agents can generate bespoke apps on-demand for individual users.
A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.
A Hacker News thread highlighted arXiv 2602.10177, where DeepMind researchers introduce Aletheia, an agent workflow for mathematics research. The paper claims progress from Olympiad-style reasoning toward PhD-level tasks and semi-autonomous open-problem exploration.