Cloudflare brings Kimi K2.5 to Workers AI and tunes the stack for agents

Original: Kimi K2.5 is now on Workers AI, helping you power agents entirely on Cloudflare’s Developer Platform. Learn how we optimized our inference stack and reduced inference costs for internal agent use cases. https://t.co/kEQ6HHpoJS View original →

Read in other languages: 한국어日本語
LLM Mar 23, 2026 By Insights AI 1 min read 1 views Source

On March 19, 2026, Cloudflare said on X that Moonshot AI’s Kimi K2.5 is now available on Workers AI. In the linked blog post, Cloudflare says Workers AI is officially entering the “big models” tier by offering frontier open-source models directly on its inference platform, starting with Kimi K2.5.

Cloudflare highlights why it picked this model for agentic work. The company says Kimi K2.5 offers a 256k context window plus multi-turn tool calling, vision inputs, and structured outputs, which makes it suitable for long-running, stateful agent flows. The broader pitch is that developers can now keep the entire agent lifecycle on a single platform by combining the model with Cloudflare primitives such as Durable Objects for state, Workflows for long-running tasks, and sandboxed execution surfaces for tools.

  • Cloudflare says it built custom kernels for Kimi K2.5 on top of its Infire inference engine.
  • Workers AI now surfaces cached tokens as a usage metric and discounts cached tokens relative to fresh input tokens.
  • A new `x-session-affinity` header is meant to improve prefix-cache hit rates and reduce latency and cost in multi-turn agent sessions.

The important part is not just model availability. Many platforms can host a model card. Cloudflare is trying to differentiate by packaging serving optimizations, stateful primitives, and agent infrastructure into the same stack. That matters for teams that want an open-source frontier model without taking on the operational burden of self-hosting, tuning kernels, or building their own cache-aware routing layer for long-context workflows.

Cloudflare also says the Agents SDK starter now uses Kimi K2.5 as its default model, which signals that the launch is meant to feed directly into agent developer workflows rather than sit as a generic catalog entry. The original X post is here, and the detailed launch post is on Cloudflare.

Share: Long

Related Articles

LLM sources.twitter 1d ago 2 min read

Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.

LLM sources.twitter 12h ago 2 min read

Cloudflare said on March 20, 2026 that Kimi K2.5 is now available on Workers AI so developers can run agents end-to-end on its platform. The linked Cloudflare blog says the model ships with a 256K context window, multi-turn tool calling, vision, and structured outputs, and that one internal agent workload cut costs by 77% after the switch.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.