ERNIE 5.1 hits #13 globally while cutting pretraining cost to 6%
Original: Introducing ERNIE 5.1 Preview — now live! 🚀 Ranked #13 globally and #1 among Chinese labs on @arena 's Text Arena. Top-10 worldwide across:… View original →
What the leaderboard tweet actually says
Benchmark brag posts are easy to ignore until they pair rank with cost compression. ERNIE 5.1 Preview did both. In its April 29 X post, Baidu's developer-facing ERNIE account said the model is now No. 13 globally on LMArena Text and No. 1 among Chinese labs, while cutting total parameters to about one-third of ERNIE 5.0, active parameters to about one-half, and pretraining cost to roughly 6% of comparable models.
"Ranked #13 globally and #1 among Chinese labs on Text Arena."
The linked ERNIE blog adds the category-level detail: #9 in Math, #1 in Legal & Government, #4 in Business, Management & Financial Ops, and #7 in Software & IT Services. Baidu also attributes the result to decoupled fully-asynchronous reinforcement learning and scaled agentic post-training. Even if one treats vendor-written leaderboard posts cautiously, the combination of rank and compressed training cost is the signal worth tracking.
Why this matters beyond one Arena update
The Chinese model race is no longer only about absolute size or domestic ranking. Cost-efficient training and strong category performance matter more if labs want to refresh previews quickly and still hold their place against larger rivals. A model that reaches upper-tier Arena placement with a much smaller effective training bill changes how often a lab can iterate and how aggressively it can price API access later.
The ErnieforDevs account usually posts release and evaluation milestones for Baidu's developer stack, so this tweet fits a pattern: ship a preview, validate it in a public ranking, then point developers toward direct testing. What to watch next is whether ERNIE 5.1 Preview shows up in broader third-party benchmarks and products beyond Arena, and whether Baidu discloses enough API or deployment detail to prove the cost-performance story in real workloads. Source: ERNIE source tweet · ERNIE blog post
Related Articles
OpenRouter says Fusion reached within 1% of Claude Fable 5 on 100 DRACO deep-research tasks while costing roughly half as much. The product shifts the contest from one frontier model to a server-side panel, judge, and synthesizer workflow.
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
The r/MachineLearning thread captured a practical benchmark problem: closed models dominate eval tables even when their results are not reproducible in the old Papers with Code sense.