Install
openclaw skills install liteparseParse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a Word/Excel/PowerPoint file, batch-process a folder of documents, or generate page screenshots for LLM vision workflows. Runs entirely offline — no cloud, no API key. Supports PDF, DOCX, XLSX, PPTX, images (jpg/png/webp), and more. Triggers on phrases like "extract text from this PDF", "parse this document", "get the text out of", "screenshot this PDF page", or any request to read/extract content from a file.
openclaw skills install liteparseLocal document parser built on PDF.js + Tesseract.js. Zero cloud dependencies.
Binary: lit (installed globally via npm)
Docs: https://developers.llamaindex.ai/liteparse/
# Parse a PDF to text (stdout)
lit parse document.pdf
# Parse to file
lit parse document.pdf -o output.txt
# Parse to JSON (includes bounding boxes)
lit parse document.pdf --format json -o output.json
# Specific pages only
lit parse document.pdf --target-pages "1-5,10,15-20"
# No OCR (faster, text-layer PDFs only)
lit parse document.pdf --no-ocr
# Batch parse a directory
lit batch-parse ./input-dir ./output-dir
# Screenshot pages (for vision model input)
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots
| Format | Use case |
|---|---|
text (default) | Plain text extraction, feeding into prompts |
json | Structured output with bounding boxes, useful for layout-aware tasks |
--no-ocr for pure text-layer PDFs (faster, no network needed)--ocr-language fra+engWorks natively: PDF
Requires LibreOffice (brew install --cask libreoffice): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv
Requires ImageMagick (brew install imagemagick): .jpg, .png, .gif, .bmp, .tiff, .webp
npm install -g @llamaindex/liteparsebrew tap run-llama/liteparse) but requires current macOS CLT — use npm as primary install path on this machine/opt/homebrew/bin/litlit parse file.pdf -o /tmp/output.txt then read into contextlit screenshot to generate page images, then pass to image tool or similarlit batch-parse — it reuses the PDF engine across files for efficiencySee references/output-examples.md for sample JSON/text output structure.