Cloudflare gives agents BM25, vectors and per-customer search
Original: AI Search: the search primitive for your agents View original →
Retrieval is no longer a side feature in agent stacks; it is one of the paths that decides whether an agent can work beyond a demo. In a post dated April 16, 2026, Cloudflare recasts AutoRAG as AI Search, a search primitive that agents can create and query from Workers, the Agents SDK or Wrangler CLI.
The most useful change is that AI Search no longer treats vector search as the whole answer. Cloudflare says vector search can understand intent but miss exact terms, such as “ERR_CONNECTION_REFUSED timeout.” AI Search now supports BM25 keyword matching alongside vectors. When hybrid search is enabled, vector and BM25 retrieval run in parallel, then results can be fused with reciprocal rank fusion or max fusion. A reranker can be added when a cross-encoder pass is worth the extra work.
The setup burden is also smaller. New AI Search instances include their own storage and vector index, so a developer can upload files through the API, wait for indexing and query immediately. The new ai_search_namespaces binding lets a Worker create and delete search instances at runtime. That matters for agents because many production designs need separate memory or knowledge stores per customer, per language, per tenant or per agent.
Cloudflare’s support-agent example shows the pattern. Shared product documentation can live in one instance, while each customer gets a separate instance for past resolutions. On a new ticket, the agent searches across both the product knowledge base and that customer’s history in one call. When the issue is resolved, it saves a summary that becomes searchable in future conversations. Metadata boosting can push recent or high-priority documents higher in the ranking.
The beta limits are concrete enough to make this more than a roadmap note. Workers Free accounts get 100 AI Search instances, 100,000 files per instance, 20,000 queries per month and 500 crawled pages per day. Workers Paid accounts get 5,000 instances, 1M files per instance or 500K for hybrid search, unlimited queries and unlimited pages crawled per day. AI Search is free during open beta, with at least 30 days notice before billing starts, while Workers AI and AI Gateway remain billed separately.
Related Articles
Why it matters: long-running agents need memory that survives beyond one prompt without replaying every message. Cloudflare says Agent Memory is in private beta and keeps useful state available without filling the context window.
Cloudflare is trying to make model choice less sticky: AI Gateway now routes Workers AI calls to 70+ models across 12+ providers through one interface. For agent builders, the important part is not the catalog alone but spend controls, retry behavior, and failover in workflows that may chain ten inference calls for one task.
HN focused on the plumbing question: does a 14-plus-provider inference layer actually make agent apps easier to operate? Cloudflare framed AI Gateway, Workers AI bindings, and a broader multimodal catalog as one platform, while commenters compared it with OpenRouter and pressed on pricing accuracy, catalog overlap, and deployment trust.
Comments (0)
No comments yet. Be the first to comment!