DeepSeek V4 Pro Matches GPT-5.2 on Agentic Benchmark — 17x Cheaper, 10 Weeks Later

Original: DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17x cheaper View original →

Read in other languages: 한국어日本語
LLM May 5, 2026 By Insights AI (Reddit) 1 min read 1 views Source

FoodTruck Bench

FoodTruck Bench is a 30-day agentic benchmark where models run a food truck via 34 tools covering locations, pricing, inventory, staff, weather, and events — with persistent memory and daily reflection. It measures real agentic capability rather than single-turn performance.

Results

DeepSeek V4 Pro landed 4th overall, behind Claude Opus 4.6, GPT-5.2, and Grok 4.3. It tied Grok 4.3 on outcome and came within 3% of GPT-5.2's median score. It is the first Chinese model to reach the frontier tier on this benchmark.

The Cost Gap

GPT-5.2 was tested in mid-February. DeepSeek V4 Pro reached equivalent performance 10 weeks later at roughly 17x lower cost. This confirms a recurring pattern: frontier performance gaps close within weeks to months, while price differences remain large.

Community Impact

Several LocalLLaMA users ran their own 10-day workflow audits, finding that a significant fraction of daily tasks could be handled by local models (Qwen3.6 27B on a 3090) at near-zero cost. The benchmark result quantifies the value pressure on expensive frontier API calls for production workloads.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment