NuExtract3 targets local document extraction with a 4B VLM

NuMind released NuExtract3, a 4B vision-language model built for document understanding. Its main jobs are structured information extraction and document-to-Markdown conversion. The model is based on Qwen3.5-4B, published under Apache-2.0, and aimed at workflows involving scans, receipts, invoices, forms, tables, contracts, and other layout-heavy documents.

The Reddit thread gained attention because the deployment story is unusually practical for local users. The author says NuMind provides Safetensors, GGUF, and MLX weights, along with multiple quantizations, and positions the model as usable with as little as 4GB of VRAM. The team has mainly tested vLLM, SGLang, and llama.cpp. That matters for teams that want document extraction without routing sensitive files through hosted OCR or multimodal APIs.

The model card describes two major modes. For structured extraction, users provide text or images plus a JSON-like template, and the model returns values in that structure. For Markdown conversion, it turns document images into Markdown, including HTML tables, LaTeX for math, and figure tags for images. NuMind also reports internal benchmark results across roughly 600 diverse documents, where NuExtract3.4_4B-RL scored 0.651 on its structured extraction metric. The company says it plans to open-source the benchmark and publish more technical details later.

Community discussion quickly moved to edge cases: multi-column pages, dense tables, newspapers, old books, handwriting, Chinese subtitles, and vLLM loading issues. One commenter noted that shipping GGUF and MLX weights on day one changes the adoption curve because users do not have to wait for community conversions. Another described replacing paid cloud extraction in workflows where cost accumulates quickly.

Source thread: r/LocalLLaMA. Model details: Hugging Face NuExtract3 model card.

NuExtract3 targets local document extraction with a 4B VLM

Related Articles

MiniMax M3 weights hit Hugging Face with 428B total parameters

Local LLM users want the missing 80-160B middle

GLM-5.2 pushes open weights into the cost-versus-reasoning debate