Cloudflare brings Kimi K2.5 to Workers AI and shows how it cut internal agent costs

What Cloudflare posted on X

On March 20, 2026, Cloudflare said Kimi K2.5 is now available on Workers AI, framing the release as a way to build and run agents end to end on the Cloudflare Developer Platform. The tweet also pointed to new details on how Cloudflare tuned its inference stack to reduce costs for internal agent workloads.

That framing is important. Cloudflare is not just adding another model endpoint. It is positioning Workers AI as the model layer inside a broader agent runtime that already includes Durable Objects for state, Workflows for long-running jobs, and container-based execution through Dynamic Workers or Sandbox.

What the Cloudflare blog adds

The March 19 Cloudflare blog says Workers AI is moving into the large-model tier, starting with Moonshot AI's Kimi K2.5. Cloudflare describes the model as having a 256K context window with support for multi-turn tool calling, vision inputs, and structured outputs, which makes it a better fit for agent workflows than smaller open models.

The most concrete part of the post is Cloudflare's internal usage data. The company says engineers already use Kimi inside OpenCode for agentic coding tasks and inside an automated code review workflow exposed publicly through the Bonk agent on Cloudflare repositories. In one security-review use case, Cloudflare says the system processes more than 7B tokens per day, found more than 15 confirmed issues in a single codebase, and would have cost about $2.4M per year on a mid-tier proprietary model. By switching to Workers AI, it says the same workload cost 77% less.

Cloudflare also paired the launch with platform changes for agent traffic. It now surfaces cached tokens as a usage metric, discounts cached tokens relative to fresh input tokens, adds an x-session-affinity header to improve prefix-cache hit rates, and revamps its asynchronous API for durable high-volume jobs such as research or code-scanning agents.

Why this matters

The bigger signal is economic, not just technical. As teams move from occasional prompt calls to always-on coding, search, and security agents, inference cost becomes a scaling constraint long before model availability does. Cloudflare is arguing that open large models plus platform-level serving optimizations can close enough of the capability gap to make agents financially viable at higher volume.

If that claim holds, the competitive battleground shifts toward infrastructure: cache behavior, async execution, throughput tuning, and integration with the rest of the runtime. In other words, model hosting is becoming inseparable from agent-platform design.

Sources: Cloudflare X post · Cloudflare blog

Cloudflare brings Kimi K2.5 to Workers AI and shows how it cut internal agent costs

What Cloudflare posted on X

What the Cloudflare blog adds

Why this matters

Related Articles

Cloudflare brings Kimi K2.5 to Workers AI and tunes the stack for agents

Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%

Cloudflare brings large open-source models to Workers AI, starting with Kimi K2.5

Related Articles

Cloudflare brings Kimi K2.5 to Workers AI and tunes the stack for agents
LLM X/Twitter Mar 23, 2026 1 min read

Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%
LLM X/Twitter Mar 22, 2026 2 min read

Cloudflare brings large open-source models to Workers AI, starting with Kimi K2.5
LLM Mar 20, 2026 2 min read