Cloudflare gives agents BM25, vectors and per-customer search
Original: AI Search: the search primitive for your agents View original →
Retrieval is no longer a side feature in agent stacks; it is one of the paths that decides whether an agent can work beyond a demo. In a post dated April 16, 2026, Cloudflare recasts AutoRAG as AI Search, a search primitive that agents can create and query from Workers, the Agents SDK or Wrangler CLI.
The most useful change is that AI Search no longer treats vector search as the whole answer. Cloudflare says vector search can understand intent but miss exact terms, such as “ERR_CONNECTION_REFUSED timeout.” AI Search now supports BM25 keyword matching alongside vectors. When hybrid search is enabled, vector and BM25 retrieval run in parallel, then results can be fused with reciprocal rank fusion or max fusion. A reranker can be added when a cross-encoder pass is worth the extra work.
The setup burden is also smaller. New AI Search instances include their own storage and vector index, so a developer can upload files through the API, wait for indexing and query immediately. The new ai_search_namespaces binding lets a Worker create and delete search instances at runtime. That matters for agents because many production designs need separate memory or knowledge stores per customer, per language, per tenant or per agent.
Cloudflare’s support-agent example shows the pattern. Shared product documentation can live in one instance, while each customer gets a separate instance for past resolutions. On a new ticket, the agent searches across both the product knowledge base and that customer’s history in one call. When the issue is resolved, it saves a summary that becomes searchable in future conversations. Metadata boosting can push recent or high-priority documents higher in the ranking.
The beta limits are concrete enough to make this more than a roadmap note. Workers Free accounts get 100 AI Search instances, 100,000 files per instance, 20,000 queries per month and 500 crawled pages per day. Workers Paid accounts get 5,000 instances, 1M files per instance or 500K for hybrid search, unlimited queries and unlimited pages crawled per day. AI Search is free during open beta, with at least 30 days notice before billing starts, while Workers AI and AI Gateway remain billed separately.
Related Articles
MCP is moving from developer convenience to enterprise control problem. Cloudflare's new architecture matters because it tackles both parts of that shift at once: bloated tool schemas and the security mess created by ungoverned local servers.
Enterprise AI teams are discovering that model quality is only half the problem. OpenAI's Cloudflare Agent Cloud tie-up is about collapsing model access, state, storage, and tool execution into one production path instead of another demo pipeline.
Cloudflare is moving agent infrastructure out of demo mode: Sandboxes and Containers are now generally available, with 7 recent upgrades aimed at persistent coding workflows. The stack now bundles PTY terminals, credential injection, stateful interpreters, background processes, file watching, snapshots, and higher limits.
Comments (0)
No comments yet. Be the first to comment!