Cloudflare gives agents BM25, vectors and per-customer search
Original: AI Search: the search primitive for your agents View original →
Retrieval is no longer a side feature in agent stacks; it is one of the paths that decides whether an agent can work beyond a demo. In a post dated April 16, 2026, Cloudflare recasts AutoRAG as AI Search, a search primitive that agents can create and query from Workers, the Agents SDK or Wrangler CLI.
The most useful change is that AI Search no longer treats vector search as the whole answer. Cloudflare says vector search can understand intent but miss exact terms, such as “ERR_CONNECTION_REFUSED timeout.” AI Search now supports BM25 keyword matching alongside vectors. When hybrid search is enabled, vector and BM25 retrieval run in parallel, then results can be fused with reciprocal rank fusion or max fusion. A reranker can be added when a cross-encoder pass is worth the extra work.
The setup burden is also smaller. New AI Search instances include their own storage and vector index, so a developer can upload files through the API, wait for indexing and query immediately. The new ai_search_namespaces binding lets a Worker create and delete search instances at runtime. That matters for agents because many production designs need separate memory or knowledge stores per customer, per language, per tenant or per agent.
Cloudflare’s support-agent example shows the pattern. Shared product documentation can live in one instance, while each customer gets a separate instance for past resolutions. On a new ticket, the agent searches across both the product knowledge base and that customer’s history in one call. When the issue is resolved, it saves a summary that becomes searchable in future conversations. Metadata boosting can push recent or high-priority documents higher in the ranking.
The beta limits are concrete enough to make this more than a roadmap note. Workers Free accounts get 100 AI Search instances, 100,000 files per instance, 20,000 queries per month and 500 crawled pages per day. Workers Paid accounts get 5,000 instances, 1M files per instance or 500K for hybrid search, unlimited queries and unlimited pages crawled per day. AI Search is free during open beta, with at least 30 days notice before billing starts, while Workers AI and AI Gateway remain billed separately.
Related Articles
Google Research is turning enterprise RAG into an iterative agent workflow, not a one-shot retrieval step. Its sufficient-context check lifted factuality accuracy by up to 34% and reached 90.1% accuracy in a cross-corpus FramesQA setup.
Open-model competition is shifting from leaderboard scores to agent operating costs. NVIDIA says Nemotron 3 Ultra is a 550B MoE model with 5x faster inference and up to 30% lower cost for complex agentic tasks.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.