Cloudflare turns AI Gateway into one API for 70+ models
Original: Cloudflare’s AI Platform: an inference layer designed for agents View original →
Cloudflare is pushing AI Gateway from a proxy into a full inference layer for agents, giving developers one API path to 70+ models across 12+ providers. The change matters because agent workflows rarely use a single model anymore: a support agent might classify with a cheap model, plan with a stronger reasoning model, and run task-specific calls with smaller models. When that chain stretches to ten calls, latency, outages, and cost reporting become product problems rather than backend details.
In the April 16 source post, Cloudflare says Workers developers can now call third-party models through the same AI.run() binding used for Workers AI. A switch from a Cloudflare-hosted model to models from providers such as OpenAI, Anthropic, Alibaba Cloud, Google, Runway, Vidu, Recraft, MiniMax, InWorld, AssemblyAI, Pixverse, and Bytedance can be a one-line code change. REST API access for non-Workers environments is planned for the coming weeks.
The platform angle is the stronger story. Cloudflare says AI Gateway gives teams one place to watch AI spend across providers, add request metadata for breakdowns by customer or workflow, and rely on automatic routing when a model is available through more than one provider. For streaming calls, the gateway can buffer responses so a long-running agent can reconnect without starting inference again or paying twice for the same output tokens.
Cloudflare is also bringing Replicate closer to its AI Platform team. The company says Replicate models will move into AI Gateway, while Replicate-hosted workloads are being replatformed onto Cloudflare infrastructure. For teams already juggling custom models, managed open models, and commercial APIs, this is a direct bid to own the orchestration layer under agent apps.
The risk to watch is lock-in at a different level. A unified model catalog reduces provider friction, but it also makes Cloudflare the place where routing, observability, credits, and reliability policy live. If the service performs as advertised, it could become a practical control plane for production agents. If not, teams may decide that model independence is only useful when the gateway itself stays boring and dependable.
Related Articles
HN focused on the plumbing question: does a 14-plus-provider inference layer actually make agent apps easier to operate? Cloudflare framed AI Gateway, Workers AI bindings, and a broader multimodal catalog as one platform, while commenters compared it with OpenRouter and pressed on pricing accuracy, catalog overlap, and deployment trust.
Cloudflare said on X on March 19 that Kimi K2.5 is now available on Workers AI. The launch pairs a frontier open-source model with platform features aimed at lowering latency and cost for agent workloads.
Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.
Comments (0)
No comments yet. Be the first to comment!