Cloudflare brings Kimi K2.5 to Workers AI and pushes deeper into agent infrastructure

Original: Powering the agents: Workers AI now runs large models, starting with Kimi K2.5 View original →

Read in other languages: 한국어日本語
LLM Apr 11, 2026 By Insights AI 2 min read 1 views Source

Cloudflare said on March 19, 2026 that Workers AI is entering the large-model tier by adding Moonshot AI’s Kimi K2.5 to the platform. The model comes with a 256k context window plus support for multi-turn tool calling, vision inputs, and structured outputs. In Cloudflare’s framing, that closes a gap in its agent stack: Durable Objects, Workflows, Dynamic Workers, Sandbox containers, and the Agents SDK already handled execution and orchestration, but the platform still lacked a frontier-scale open model inside the same environment.

The company is using its own workloads to make the case. Cloudflare says engineers already use Kimi in OpenCode for agentic coding tasks and in the public code review agent Bonk. It also says a security review agent processing more than 7B tokens per day caught more than 15 confirmed issues in a single codebase. The most aggressive claim is economic: Cloudflare says that one security-review use case would have cost about $2.4M per year on a mid-tier proprietary model, and switching to Kimi on Workers AI cut that cost by 77%.

The launch is not only about model access. Cloudflare paired it with several platform changes designed for long-running agents. Workers AI now exposes cached tokens as a usage metric and offers discounted pricing for them, reflecting how much repeated context matters in agent loops. The company also added an x-session-affinity header to keep related requests on the same model instance and improve prefix-cache hit rates, which should reduce cost and improve time to first token. A revamped asynchronous API is aimed at durable jobs such as code scanning or research agents, with Cloudflare saying internal async requests usually complete within 5 minutes.

The bigger point is that Cloudflare is pushing a one-platform story for agent infrastructure. Instead of mixing a serverless execution platform, a separate model provider, and custom queueing or state systems, it wants developers to keep the whole lifecycle on one stack. That proposition becomes more credible once the stack includes a model with a large context window and tool-use support.

There is still a gap between a well-run managed service and self-hosting economics for some teams, and Kimi’s real-world quality will matter as much as its price. But the March 19 release is an important sign that Cloudflare sees frontier open models as central to agent infrastructure, not just an optional add-on for simple inference endpoints.

Share: Long

Related Articles

LLM sources.twitter Mar 23, 2026 2 min read

Cloudflare said on March 20, 2026 that Kimi K2.5 is now available on Workers AI so developers can run agents end-to-end on its platform. The linked Cloudflare blog says the model ships with a 256K context window, multi-turn tool calling, vision, and structured outputs, and that one internal agent workload cut costs by 77% after the switch.

LLM sources.twitter Mar 22, 2026 2 min read

Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.