PDF Parsing for AI Agents: liteparse vs GLM-OCR vs LlamaParse
PDF parsing sounds like a solved problem. It isnβt β and the gap between βgood enough for simple PDFsβ and βreliable for production agent pipelinesβ is where most builders learn the hard way.
Three tools cover the practical spectrum for AI agent use cases in 2026. Hereβs when to reach for each.
The Three Tools
liteparse β Local, Fast, Zero Setup
Repo: github.com/run-llama/liteparse
From: LlamaIndex team (run-llama)
Model: Tesseract.js (CPU-only, classical OCR)
npm i -g @llamaindex/liteparse
lit parse document.pdf
Thatβs it. No API key. No GPU. No cloud. Works on Linux, macOS (Intel/ARM), and Windows.
What it does well:
- Native-text PDFs (those generated by software β Word, LaTeX, most web PDFs): near-perfect extraction
- Bounding boxes on every text element β spatial layout preserved
- Buffer input: zero disk I/O, pipe PDFs in from memory
- Batch processing:
lit batch-parse ./input ./output - Screenshot mode: renders pages as images for downstream VLM processing
- Pluggable OCR: swap Tesseract for EasyOCR, PaddleOCR, or any custom HTTP server
Where it struggles:
- Scanned PDFs with no text layer (Tesseract accuracy drops sharply on complex layouts)
- Dense tables, multi-column academic papers, handwritten annotations
- Non-English documents (Tesseract needs language packs configured)
Node.js API:
import { LiteParse } from '@llamaindex/liteparse';
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text); // with bounding boxes in result.pages
GLM-OCR 0.9B β VLM-Quality, Still Local
Repo: github.com/THUDM/GLM-OCR
From: Tsinghua KEG Lab
Model: 0.9B vision-language model
GLM-OCR is a different category of tool β a small vision-language model purpose-built for document understanding. In benchmarks published in March 2026, it outperformed Gemini on document parsing tasks and matched models many times its size.
What it does well:
- Dense tables: understands cell relationships, not just text extraction
- Multi-column layouts: tracks reading order semantically
- Handwritten annotations: handles mixed printed/handwritten content
- Visual document elements: charts, figures, form fields
- Scanned PDFs: robust to image quality variation
- Mathematical notation (arXiv papers): reasonable accuracy
Where it struggles:
- Requires a GPU for fast inference (CPU is slow at 0.9B params)
- More setup than liteparse (Python, model download ~1.8GB)
- Overkill for clean native-text PDFs
When to use it: When layout fidelity and accuracy matter more than speed β financial statements, academic papers, legal contracts, forms with structured data.
LlamaParse β Production Cloud Parsing
URL: cloud.llamaindex.ai
From: LlamaIndex (same team as liteparse)
Model: Proprietary cloud pipeline
LlamaParse is what you reach for when the document is complex and accuracy is non-negotiable, and youβre willing to trade privacy/cost for reliability.
What it does well:
- Complex tables across pages
- Charts and figures with context
- Mixed document types (scanned + native text)
- Structured markdown output, ready for LLM consumption
- Handles edge cases that break local tools
- Per-page SLA guarantees in production
Where it falls short:
- Documents leave your machine (not for sensitive data without DPA)
- Per-page pricing at scale
- Requires API key and internet access
Decision Framework
Is the PDF native-text (generated by software)?
βββ YES β liteparse (fast, free, local)
βββ NO (scanned / complex layout) β
Is data sensitivity a concern?
βββ YES β GLM-OCR (local VLM, no cloud)
βββ NO β LlamaParse (most accurate, handles edge cases)
Are you processing at scale (1000+ docs/day)?
βββ liteparse for native-text (parallelizable, zero cost)
βββ LlamaParse for complex (API rate limits apply)
βββ Self-hosted GLM-OCR for sensitive + complex
Tiered Agent Pipeline
The pattern used in production agent systems combines all three:
PDF arrives
β
liteparse: attempt extraction
β
Confidence check (text length, layout flags)
β
Sufficient? β Use liteparse output
Not sufficient? β GLM-OCR (if on-prem required)
β LlamaParse (if cloud allowed)
β
Structured output β LLM context window
This keeps costs near-zero for the majority of documents (most PDFs have native text layers) while preserving accuracy for the minority that need it.
One More Thing: liteparse as an Agent Skill
liteparse ships with an official agent skill:
npx skills add run-llama/llamaparse-agent-skills --skill liteparse
The skill uses a SKILL.md format β the same spec used by OpenClaw skills. If youβre building agents on OpenClaw, you can drop it straight in. This is the direction the LlamaIndex team is pushing: document parsing as a composable agent capability, not a preprocessing step you bolt on before the real work starts.
Bottom Line
| Tool | Best For | Setup | Cost | Privacy |
|---|---|---|---|---|
| liteparse | Native-text PDFs, agent pipelines | npm i | Free | 100% local |
| GLM-OCR | Complex layouts, scanned docs, on-prem | Python + GPU | Free | 100% local |
| LlamaParse | Production complex docs, max accuracy | API key | Per-page | Cloud |
For most agent builders: start with liteparse. If your documents have complex layouts or low text-layer quality, reach for GLM-OCR before paying for cloud. Reserve LlamaParse for the cases where accuracy genuinely canβt be compromised and data residency isnβt a constraint.