Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%

What Cloudflare announced on X

On March 20, 2026, Cloudflare said Kimi K2.5 was available on Workers AI and pitched the launch as a way to build and run agents end to end on the Cloudflare Developer Platform. The important part was not just that a new model appeared in the catalog, but that Cloudflare framed model inference, workflow orchestration, state, and secure execution as pieces of one agent platform.

What the launch post says about the model and the platform

Cloudflare’s blog says Workers AI is moving into the “big models” tier with frontier open-source systems, starting with Moonshot AI’s Kimi K2.5. It describes the model as supporting a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, which positions it for more complex agentic workloads than a simple text chat deployment.

Cloudflare says engineers have been using Kimi K2.5 inside its internal OpenCode environment as a daily driver for agentic coding tasks.
The company also integrated the model into its public code review agent, Bonk, across Cloudflare GitHub repositories.
One internal security-review agent reportedly processes more than 7B tokens per day and caught more than 15 confirmed issues in a single codebase.
Cloudflare says that moving that use case to Workers AI cut costs by 77% compared with a mid-tier proprietary model, which it estimates would have cost about $2.4 million per year for that one workload.

Why this matters

The announcement stands out because Cloudflare is making an infrastructure and economics argument as much as a model argument. In agent-heavy environments, cost is often driven not only by output tokens but by large context windows, repeated tool schemas, codebase-sized prompts, and frequent multi-turn requests. Cloudflare pairs the model launch with platform features such as surfaced prefix-caching metrics and a new x-session-affinity header designed to improve cache hit rates, time to first token, and overall throughput.

That matters because many companies are now less concerned with whether a frontier model exists and more concerned with whether they can run agent workloads continuously without proprietary-model pricing breaking the budget. Cloudflare’s pitch is that teams should not have to self-host a large open model, hand-tune kernels, and manage inference topology just to get a viable cost profile. If the company’s internal results generalize, Workers AI could become a serious option for teams that want open-model reasoning and tool use without taking on the full operational burden themselves.

Sources: Cloudflare X post · Cloudflare Workers AI launch post

Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%

What Cloudflare announced on X

What the launch post says about the model and the platform

Why this matters

Related Articles

Cloudflare brings large open-source models to Workers AI, starting with Kimi K2.5

Cloudflare brings Kimi K2.5 to Workers AI and tunes the stack for agents

Cloudflare brings Kimi K2.5 to Workers AI and pushes deeper into agent infrastructure

Comments (0)

Leave a Comment

Related Articles

Cloudflare brings large open-source models to Workers AI, starting with Kimi K2.5
LLM Mar 20, 2026 2 min read

Cloudflare brings Kimi K2.5 to Workers AI and tunes the stack for agents
LLM X/Twitter Mar 23, 2026 1 min read

Cloudflare brings Kimi K2.5 to Workers AI and pushes deeper into agent infrastructure
LLM Apr 11, 2026 2 min read