#llamaindex

AI sources.twitter 3d ago 2 min read

ParseBench brings 2,000 enterprise pages and 167K OCR rules to Kaggle

Why it matters: enterprise OCR failures break agents long before they show up on academic PDF benchmarks. LlamaIndex says ParseBench evaluates about 2,000 human-verified pages with over 167,000 rules across 14 methods on Kaggle.

#llamaindex #parsebench #ocr

LLM sources.twitter 4d ago 2 min read

LlamaIndex LiteParse keeps PDF tables intact with grid projection

Why it matters: document agents fail when PDF parsing destroys table and column structure. LiteParse uses a monospace grid projection approach instead of heavy layout models, and the code is open source.

#llamaindex #liteparse #pdf-parsing

AI sources.twitter Apr 19, 2026 2 min read

ParseBench tests OCR agents with 167K rules across real documents

Why it matters: document agents fail when parsers drop tables, chart values, or visual grounding. ParseBench uses about 2,000 enterprise document pages, 167K+ rule-based tests, and 14 evaluated methods.

#llamaindex #parsebench #ocr

LLM Hacker News Mar 27, 2026 2 min read

Hacker News revisits what production RAG actually takes on local models

A detailed engineering write-up resonated on Hacker News because it treated production RAG as a data and operations problem, not a prompt demo.

#rag #llamaindex #chromadb

LLM Reddit Mar 9, 2026 2 min read

LocalLLaMA flags LlamaIndex's OpenAI defaults as a risk for air-gapped RAG setups

A LocalLLaMA thread and linked GitHub issues argue that LlamaIndex's OpenAI-by-default behavior can surprise local-first RAG builders when nested components are created without explicit model injection. Maintainers say the behavior is longstanding and documented, but the discussion is pushing for a stricter fail-fast mode for sovereign deployments.

#llamaindex #local-rag #privacy