Cloudflare scanned 200,000 high-traffic domains and found only 4% declaring AI usage preferences, 3.9% serving Markdown content negotiation, and fewer than 15 exposing MCP Server Cards or API Catalogs. Its Agent Readiness score turns the agent web into an audit checklist.
#cloudflare
RSS FeedCloudflare is giving paid-plan sites a way to turn canonical tags into HTTP 301 redirects for verified AI training crawlers. The move matters because Cloudflare saw 4.8M AI crawler visits to its own developer docs in 30 days, with deprecated pages consumed at the same rate as current ones.
Why it matters: Cloudflare is attacking the memory-bandwidth bottleneck in LLM serving rather than only buying more GPUs. Its post reports 15-22% model-size reduction, about 3 GB VRAM saved on Llama 3.1 8B, and open-sourced GPU kernels.
Why it matters: long-running agents need memory that survives beyond one prompt without replaying every message. Cloudflare says Agent Memory is in private beta and keeps useful state available without filling the context window.
HN focused on the plumbing question: does a 14-plus-provider inference layer actually make agent apps easier to operate? Cloudflare framed AI Gateway, Workers AI bindings, and a broader multimodal catalog as one platform, while commenters compared it with OpenRouter and pressed on pricing accuracy, catalog overlap, and deployment trust.
Cloudflare says Workers AI has made Kimi K2.5 3x faster for agent workloads. The technical change pushed p90 time per token from roughly 100 ms to 20-30 ms and raised peak input-token cache hit ratios from 60% to 80% with heavy internal users.
Cloudflare is turning AutoRAG into AI Search, a retrieval primitive agents can create and query from Workers. The open beta adds BM25 plus vector search, built-in storage and index, metadata boosting, and cross-instance search with concrete free and paid limits.
HN treated Cloudflare Email Service less as agent magic and more as a new email sender entering a hostile protocol world. The thread focused on Workers integration, SES alternatives, spam pressure, MTA-STS, and sending limits.
Cloudflare is trying to make model choice less sticky: AI Gateway now routes Workers AI calls to 70+ models across 12+ providers through one interface. For agent builders, the important part is not the catalog alone but spend controls, retry behavior, and failover in workflows that may chain ten inference calls for one task.
Cloudflare is packaging an enterprise playbook for MCP at the moment companies are wiring agents into internal systems. The headline number is a 99.9% token reduction from its Code Mode design, alongside new Shadow MCP detection for unauthorized remote servers.
Cloudflare is moving agent infrastructure out of demo mode: Sandboxes and Containers are now generally available, with 7 recent upgrades aimed at persistent coding workflows. The stack now bundles PTY terminals, credential injection, stateful interpreters, background processes, file watching, snapshots, and higher limits.
MCP is moving from developer convenience to enterprise control problem. Cloudflare's new architecture matters because it tackles both parts of that shift at once: bloated tool schemas and the security mess created by ungoverned local servers.