#llamaindex

AI X/Twitter Apr 23, 2026 1 min read

ParseBench, Kaggle에 2,000개 기업 문서·16.7만 OCR 규칙 공개…에이전트용 검증판

중요한 점은 enterprise OCR failure가 academic PDF benchmark보다 훨씬 먼저 agent를 망가뜨린다는 데 있다. LlamaIndex는 ParseBench가 사람 검증을 거친 약 2,000개 페이지와 16만7천 개가 넘는 규칙으로 14개 방법을 Kaggle에서 비교한다고 적었다.

#llamaindex #parsebench #ocr

LLM X/Twitter Apr 22, 2026 1 min read

LlamaIndex LiteParse, grid projection으로 PDF table 구조를 보존하는 parser

중요한 점은 document agent가 PDF parsing 단계에서 table과 column 구조를 잃으면 reasoning도 같이 무너진다는 데 있다. LiteParse는 heavy layout model 대신 monospace grid projection을 쓰고, code를 open source로 공개했다.

#llamaindex #liteparse #pdf-parsing

AI X/Twitter Apr 19, 2026 1 min read

ParseBench, 실제 기업 문서 OCR agent를 16.7만개 규칙 benchmark로 검증한다

중요한 점은 document agent가 table, chart value, visual grounding을 잃으면 실제 업무 판단이 흔들린다는 데 있다. ParseBench는 약 2,000쪽의 enterprise document, 16.7만개+ rule-based tests, 14개 method 평가를 제시한다.

#llamaindex #parsebench #ocr

LLM Hacker News Mar 27, 2026 1 min read

Hacker News가 다시 조명한 production RAG의 현실, local model로 451GB를 다루는 법

Andros Fenollosa의 회고가 Hacker News에서 반응을 얻은 이유는 production RAG를 prompt demo가 아니라 데이터와 운영 문제로 다뤘기 때문이다.

#rag #llamaindex #chromadb

LLM Reddit Mar 9, 2026 2 min read

LocalLLaMA, air-gapped RAG에서 LlamaIndex의 OpenAI default를 위험 요소로 지적

LocalLLaMA의 토론과 연결된 GitHub issue는, LlamaIndex가 nested component에서 model을 명시적으로 주입하지 않을 때 OpenAI-by-default 동작으로 local-first RAG 개발자를 혼란스럽게 만들 수 있다고 지적한다. maintainer 측은 이 동작이 오래전부터 문서화돼 왔다고 설명하지만, 커뮤니티는 sovereign deployment를 위한 더 엄격한 fail-fast 모드를 요구하고 있다.

#llamaindex #local-rag #privacy