Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%
Original: Kimi K2.5 is now available on #WorkersAI . You can now build and run agents end-to-end on the Cloudflare Developer Platform. Read about how we tuned our inference stack to drive down costs for internal agent workflows. cfl.re/4bmpZgb View original →
What Cloudflare announced on X
On March 20, 2026, Cloudflare said Kimi K2.5 was available on Workers AI and pitched the launch as a way to build and run agents end to end on the Cloudflare Developer Platform. The important part was not just that a new model appeared in the catalog, but that Cloudflare framed model inference, workflow orchestration, state, and secure execution as pieces of one agent platform.
What the launch post says about the model and the platform
Cloudflare’s blog says Workers AI is moving into the “big models” tier with frontier open-source systems, starting with Moonshot AI’s Kimi K2.5. It describes the model as supporting a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, which positions it for more complex agentic workloads than a simple text chat deployment.
- Cloudflare says engineers have been using Kimi K2.5 inside its internal OpenCode environment as a daily driver for agentic coding tasks.
- The company also integrated the model into its public code review agent, Bonk, across Cloudflare GitHub repositories.
- One internal security-review agent reportedly processes more than 7B tokens per day and caught more than 15 confirmed issues in a single codebase.
- Cloudflare says that moving that use case to Workers AI cut costs by 77% compared with a mid-tier proprietary model, which it estimates would have cost about $2.4 million per year for that one workload.
Why this matters
The announcement stands out because Cloudflare is making an infrastructure and economics argument as much as a model argument. In agent-heavy environments, cost is often driven not only by output tokens but by large context windows, repeated tool schemas, codebase-sized prompts, and frequent multi-turn requests. Cloudflare pairs the model launch with platform features such as surfaced prefix-caching metrics and a new x-session-affinity header designed to improve cache hit rates, time to first token, and overall throughput.
That matters because many companies are now less concerned with whether a frontier model exists and more concerned with whether they can run agent workloads continuously without proprietary-model pricing breaking the budget. Cloudflare’s pitch is that teams should not have to self-host a large open model, hand-tune kernels, and manage inference topology just to get a viable cost profile. If the company’s internal results generalize, Workers AI could become a serious option for teams that want open-model reasoning and tool use without taking on the full operational burden themselves.
Sources: Cloudflare X post · Cloudflare Workers AI launch post
Related Articles
Cloudflare said on March 19, 2026 that Workers AI now supports Moonshot AI's Kimi K2.5. The company is using the model to argue that a unified agent platform can offer both strong tool use and much lower production cost.
Perplexity said on March 13, 2026 that Perplexity Computer is now available on mobile, starting with iOS inside the Perplexity app. Coming one day after the company opened Computer to Pro subscribers, the update turns the product into a more explicit cross-device agent workflow rather than a desktop-only experience.
A March 14, 2026 LocalLLaMA post outlined a CUTLASS and FlashInfer patch for SM120 Blackwell workstations, claiming major gains for Qwen3.5-397B NVFP4 inference and linking the work to FlashInfer PR #2786.
Comments (0)
No comments yet. Be the first to comment!