Cloudflare brings large open-source models to Workers AI, starting with Kimi K2.5
Original: Powering the agents: Workers AI now runs large models, starting with Kimi K2.5 View original →
What Cloudflare launched
Cloudflare said on March 19, 2026 that Workers AI now supports frontier-scale open-source models, starting with Moonshot AI's Kimi K2.5. Cloudflare highlighted a 256k context window together with multi-turn tool calling, vision inputs, and structured outputs, positioning the model as a fit for agentic workloads.
The company is using the launch to push a broader platform narrative. By running a larger model directly inside Workers AI, Cloudflare says developers can keep more of the agent lifecycle on one stack, from inference and tool use to state handling and workflow execution inside the wider Cloudflare Developer Platform.
The economics are the bigger story
Cloudflare said Kimi K2.5 is already running inside its internal OpenCode environment and in Bonk, its public code review agent. One internal security review agent processes more than 7B tokens per day and, according to Cloudflare, has caught more than 15 confirmed issues in a single codebase. The company said running that use case on a mid-tier proprietary model would have cost about $2.4M per year, while switching to Kimi on Workers AI cut costs by 77%.
That matters because open-source models are increasingly being judged on production fit, not just benchmark visibility. If a model can handle large context, structured tool use, and high-volume inference while materially reducing cost, it becomes a realistic default for coding, review, and security automation workloads instead of a fallback option.
Why the move matters
The launch suggests the AI infrastructure race is moving from raw model access toward integrated agent stacks. Cloudflare is trying to show that developers do not need a separate orchestration layer to run useful agents at scale if the platform already combines inference, tools, workflows, and deployment primitives. That puts pressure on both proprietary model vendors and competing developer platforms at the same time.
Related Articles
Perplexity said on March 11, 2026 that its Sandbox API will become both an Agent API tool and a standalone service. Existing docs already frame Agent API as a multi-provider interface with explicit tool configuration, so the update pushes code execution closer to a first-class orchestration primitive.
Perplexity said on March 13, 2026 that Perplexity Computer is now available on mobile, starting with iOS inside the Perplexity app. Coming one day after the company opened Computer to Pro subscribers, the update turns the product into a more explicit cross-device agent workflow rather than a desktop-only experience.
OpenAI posted on March 5, 2026 that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out across ChatGPT, the API, and Codex. The launch article positions GPT-5.4 as a professional-work model with 1M-token context, native computer use, stronger tool search, and better spreadsheet, document, and presentation performance.
Comments (0)
No comments yet. Be the first to comment!