Cloudflare gives agents BM25, vectors and per-customer search

Original: AI Search: the search primitive for your agents View original →

Read in other languages: 한국어日本語
LLM Apr 17, 2026 By Insights AI 2 min read 1 views Source

Retrieval is no longer a side feature in agent stacks; it is one of the paths that decides whether an agent can work beyond a demo. In a post dated April 16, 2026, Cloudflare recasts AutoRAG as AI Search, a search primitive that agents can create and query from Workers, the Agents SDK or Wrangler CLI.

The most useful change is that AI Search no longer treats vector search as the whole answer. Cloudflare says vector search can understand intent but miss exact terms, such as “ERR_CONNECTION_REFUSED timeout.” AI Search now supports BM25 keyword matching alongside vectors. When hybrid search is enabled, vector and BM25 retrieval run in parallel, then results can be fused with reciprocal rank fusion or max fusion. A reranker can be added when a cross-encoder pass is worth the extra work.

The setup burden is also smaller. New AI Search instances include their own storage and vector index, so a developer can upload files through the API, wait for indexing and query immediately. The new ai_search_namespaces binding lets a Worker create and delete search instances at runtime. That matters for agents because many production designs need separate memory or knowledge stores per customer, per language, per tenant or per agent.

Cloudflare’s support-agent example shows the pattern. Shared product documentation can live in one instance, while each customer gets a separate instance for past resolutions. On a new ticket, the agent searches across both the product knowledge base and that customer’s history in one call. When the issue is resolved, it saves a summary that becomes searchable in future conversations. Metadata boosting can push recent or high-priority documents higher in the ranking.

The beta limits are concrete enough to make this more than a roadmap note. Workers Free accounts get 100 AI Search instances, 100,000 files per instance, 20,000 queries per month and 500 crawled pages per day. Workers Paid accounts get 5,000 instances, 1M files per instance or 500K for hybrid search, unlimited queries and unlimited pages crawled per day. AI Search is free during open beta, with at least 30 days notice before billing starts, while Workers AI and AI Gateway remain billed separately.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.