DeepSeek V4 Pro Matches GPT-5.2 on Agentic Benchmark — 17x Cheaper, 10 Weeks Later
Original: DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17x cheaper View original →
FoodTruck Bench
FoodTruck Bench is a 30-day agentic benchmark where models run a food truck via 34 tools covering locations, pricing, inventory, staff, weather, and events — with persistent memory and daily reflection. It measures real agentic capability rather than single-turn performance.
Results
DeepSeek V4 Pro landed 4th overall, behind Claude Opus 4.6, GPT-5.2, and Grok 4.3. It tied Grok 4.3 on outcome and came within 3% of GPT-5.2's median score. It is the first Chinese model to reach the frontier tier on this benchmark.
The Cost Gap
GPT-5.2 was tested in mid-February. DeepSeek V4 Pro reached equivalent performance 10 weeks later at roughly 17x lower cost. This confirms a recurring pattern: frontier performance gaps close within weeks to months, while price differences remain large.
Community Impact
Several LocalLLaMA users ran their own 10-day workflow audits, finding that a significant fraction of daily tasks could be handled by local models (Qwen3.6 27B on a 3090) at near-zero cost. The benchmark result quantifies the value pressure on expensive frontier API calls for production workloads.
Related Articles
Liquid AI's new LFM2.5 8B-A1B MoE model delivers 253 tokens/s on M5 Max, runs under 6GB memory on mobile, and achieves 18,500 output tokens/s on H100—all while outperforming similarly-sized dense models on key benchmarks.
OpenAI’s Deployment Simulation matters because it turns safety review into a measurable pre-release forecast. The study used about 1.3 million de-identified conversations and reported a 1.5x median multiplicative error on GPT-5-series risk estimates.
OpenRouter says Fusion reached within 1% of Claude Fable 5 on 100 DRACO deep-research tasks while costing roughly half as much. The product shifts the contest from one frontier model to a server-side panel, judge, and synthesizer workflow.