LiteParse Document Parser
v1.0.0Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name, description, and runtime instructions all describe local parsing of PDFs, Office docs, spreadsheets, and images. Required helpers (LibreOffice, ImageMagick) are plausible for the stated features (conversion, rendering, OCR). No unrelated resources or credentials are requested.
Instruction Scope
SKILL.md only instructs running local CLI commands (lit parse, batch-parse, screenshot) and using a local config file; it does not ask the agent to read unrelated system files, access secrets, or transmit data to external endpoints. Outputs are written to local files.
Install Mechanism
No install spec is included in the registry (instruction-only). SKILL.md tells the user to use Homebrew (brew install llamaindex-liteparse and brew install --cask libreoffice, imagemagick). Using Homebrew is common, but the specific brew package ('llamaindex-liteparse') and overall lack of source/homepage metadata reduce provenance; the package could come from a third-party tap. Recommend verifying the package origin before installing.
Credentials
The skill declares no required environment variables, credentials, or config paths. The local liteparse.config.json is reasonable and limited to tool options (OCR language, DPI, etc.).
Persistence & Privilege
Skill is instruction-only, does not request persistent presence, and registry flags are default (always:false). There are no instructions to modify other skills or system-wide agent settings.
Assessment
This skill looks internally consistent for local document parsing, but the package provenance is unclear. Before installing: 1) verify the Homebrew package origin (which tap/repo provides 'llamaindex-liteparse') and inspect its homepage/source; 2) run 'lit --version' and check what binary was installed and where; 3) consider installing in a sandbox or VM if you want to inspect behavior first; 4) ensure LibreOffice and ImageMagick are installed from official sources; and 5) review/output files (and any logs) to confirm no unexpected network activity or external uploads.Like a lobster shell, security has layers — review code before you run it.
latest
LiteParse
Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.
Installation
Already installed via Homebrew:
brew install llamaindex-liteparse
Verify:
lit --version
Supported Formats
| Category | Formats |
|---|---|
.pdf | |
| Word | .doc, .docx, .docm, .odt, .rtf |
| PowerPoint | .ppt, .pptx, .pptm, .odp |
| Spreadsheets | .xls, .xlsx, .xlsm, .ods, .csv, .tsv |
| Images | .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .svg |
Dependencies:
- Office documents → LibreOffice (
brew install --cask libreoffice) - Images → ImageMagick (
brew install imagemagick)
Usage
Parse a Single File
# Basic text extraction
lit parse document.pdf
# JSON output with bounding boxes
lit parse document.pdf --format json -o output.json
# Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"
# Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr
# Higher DPI for better quality
lit parse document.pdf --dpi 300
Batch Parse a Directory
lit batch-parse ./input-directory ./output-directory
# Only PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive
Generate Page Screenshots
# All pages
lit screenshot document.pdf -o ./screenshots
# Specific pages
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots
# High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots
Key Options
| Option | Description |
|---|---|
--format json | Structured JSON with bounding boxes |
--format text | Plain text (default) |
--target-pages "1-5,10" | Parse specific pages |
--dpi 300 | Higher rendering quality |
--no-ocr | Disable OCR (faster for text PDFs) |
--ocr-language fra | Set OCR language |
-o output.json | Save to file |
Config File
For repeated use, create liteparse.config.json:
{
"ocrLanguage": "en",
"ocrEnabled": true,
"maxPages": 1000,
"dpi": 150,
"outputFormat": "json",
"preciseBoundingBox": true
}
Use with:
lit parse document.pdf --config liteparse.config.json
When to Use
- PDF text extraction — fast local parsing
- Document conversion — Office docs to text/JSON
- Screenshot generation — for LLM visual analysis
- Batch processing — multiple files at once
- Offline/air-gapped — no cloud required
Comments
Loading comments...
