Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%

Original: Kimi K2.5 is now available on #WorkersAI . You can now build and run agents end-to-end on the Cloudflare Developer Platform. Read about how we tuned our inference stack to drive down costs for internal agent workflows. cfl.re/4bmpZgb View original →

Read in other languages: 한국어日本語
LLM Mar 22, 2026 By Insights AI 2 min read 1 views Source

What Cloudflare announced on X

On March 20, 2026, Cloudflare said Kimi K2.5 was available on Workers AI and pitched the launch as a way to build and run agents end to end on the Cloudflare Developer Platform. The important part was not just that a new model appeared in the catalog, but that Cloudflare framed model inference, workflow orchestration, state, and secure execution as pieces of one agent platform.

What the launch post says about the model and the platform

Cloudflare’s blog says Workers AI is moving into the “big models” tier with frontier open-source systems, starting with Moonshot AI’s Kimi K2.5. It describes the model as supporting a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, which positions it for more complex agentic workloads than a simple text chat deployment.

  • Cloudflare says engineers have been using Kimi K2.5 inside its internal OpenCode environment as a daily driver for agentic coding tasks.
  • The company also integrated the model into its public code review agent, Bonk, across Cloudflare GitHub repositories.
  • One internal security-review agent reportedly processes more than 7B tokens per day and caught more than 15 confirmed issues in a single codebase.
  • Cloudflare says that moving that use case to Workers AI cut costs by 77% compared with a mid-tier proprietary model, which it estimates would have cost about $2.4 million per year for that one workload.

Why this matters

The announcement stands out because Cloudflare is making an infrastructure and economics argument as much as a model argument. In agent-heavy environments, cost is often driven not only by output tokens but by large context windows, repeated tool schemas, codebase-sized prompts, and frequent multi-turn requests. Cloudflare pairs the model launch with platform features such as surfaced prefix-caching metrics and a new x-session-affinity header designed to improve cache hit rates, time to first token, and overall throughput.

That matters because many companies are now less concerned with whether a frontier model exists and more concerned with whether they can run agent workloads continuously without proprietary-model pricing breaking the budget. Cloudflare’s pitch is that teams should not have to self-host a large open model, hand-tune kernels, and manage inference topology just to get a viable cost profile. If the company’s internal results generalize, Workers AI could become a serious option for teams that want open-model reasoning and tool use without taking on the full operational burden themselves.

Sources: Cloudflare X post · Cloudflare Workers AI launch post

Share: Long

Related Articles

LLM sources.twitter 6d ago 2 min read

Perplexity said on March 13, 2026 that Perplexity Computer is now available on mobile, starting with iOS inside the Perplexity app. Coming one day after the company opened Computer to Pro subscribers, the update turns the product into a more explicit cross-device agent workflow rather than a desktop-only experience.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.