Mistral OCR 4 adds boxes, block types, confidence for 170 languages
Original: Mistral OCR 4 adds boxes, block types, and confidence scores for 170 languages View original →
Document extraction moves beyond text
The important shift in document AI is no longer just whether a model can read a page. The harder question is whether it can return output that search, compliance, and automation systems can trust. In a June 23, 2026 post on X, Mistral AI wrote: “Introducing Mistral OCR 4. It creates structure with bounding boxes, block classification, and inline confidence scores in 170 languages.” That short post positions OCR 4 as a document-understanding model, not only a text-recognition upgrade. Source tweet
Mistral’s linked product post gives the technical shape. OCR 4 handles common enterprise formats including PDF, DOC, PPT, and OpenDocument. Instead of returning only extracted text, it localizes each block with a bounding box, classifies block types such as titles, tables, equations, and signatures, and adds confidence scores at page and word level. That matters for retrieval-augmented generation because chunks can be built from clean structural units, and it matters for regulated workflows because low-confidence spans can be routed to human reviewers.
The benchmark claims are concrete. Mistral reports an 85.20 score on OlmOCRBench and says independent annotators preferred OCR 4 against tested systems at an average 72% win rate across more than 600 documents and more than 12 languages. The company also says its internal multilingual evaluation covers eight language groups, with the widest gains in specialized and low-resource languages. Pricing is listed at $4 per 1,000 pages through the API, or $2 per 1,000 pages with the 50% Batch API discount.
The account is Mistral’s official channel, typically used for model, product, and research releases. OCR 4 also connects to Mistral’s open-source Search Toolkit, where structured document output can feed ingestion, retrieval, and evaluation pipelines for enterprise search. The next thing to watch is whether the claimed benchmark lead survives customer-side tests on messy internal documents, and whether single-container self-hosting expands adoption among teams with data residency or sovereignty constraints.
Related Articles
A post on r/LocalLLaMA highlighted Kreuzberg v4.5, a Rust-based document intelligence framework that now adds stronger layout and table understanding. The release claims Docling-level quality with lower memory overhead and materially faster processing.
DeepSeek has reportedly raised $7.4B at a valuation above $50B in its first external funding round. The unusual part is control: most investors are said to accept a five-year lock-up and no voting rights.
xAI’s new video model matters because speed is becoming a product feature in AI video. The company says 6-second 720p clips now render in about 25 seconds, down from more than 40 seconds, while API access is out of preview.