DeepSeek V4 Pro Matches GPT-5.2 on Agentic Benchmark — 17x Cheaper, 10 Weeks Later
Original: DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17x cheaper View original →
FoodTruck Bench
FoodTruck Bench is a 30-day agentic benchmark where models run a food truck via 34 tools covering locations, pricing, inventory, staff, weather, and events — with persistent memory and daily reflection. It measures real agentic capability rather than single-turn performance.
Results
DeepSeek V4 Pro landed 4th overall, behind Claude Opus 4.6, GPT-5.2, and Grok 4.3. It tied Grok 4.3 on outcome and came within 3% of GPT-5.2's median score. It is the first Chinese model to reach the frontier tier on this benchmark.
The Cost Gap
GPT-5.2 was tested in mid-February. DeepSeek V4 Pro reached equivalent performance 10 weeks later at roughly 17x lower cost. This confirms a recurring pattern: frontier performance gaps close within weeks to months, while price differences remain large.
Community Impact
Several LocalLLaMA users ran their own 10-day workflow audits, finding that a significant fraction of daily tasks could be handled by local models (Qwen3.6 27B on a 3090) at near-zero cost. The benchmark result quantifies the value pressure on expensive frontier API calls for production workloads.
Related Articles
DeepSeek released DeepSeek-V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active), both Mixture-of-Experts models with MIT license and 1M token context. V4-Pro is the largest open-weights model released so far, and its pricing at $1.74/M input undercuts GPT-5.4 and Claude Sonnet 4.6 by more than half.
The latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
The latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
Comments (0)
No comments yet. Be the first to comment!