Docs LiteParse

v1.0.0

Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...

⭐ 0· 60·0 current·0 all-time

by@ricanwarfare·duplicate of @ricanwarfare/liteparse-docs

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (local parsing of PDFs, Office files, images) align with the actions described in SKILL.md. Dependences listed (LibreOffice for Office docs, ImageMagick for images) are appropriate for the stated tasks. The requested functionality (text extraction, JSON with bounding boxes, screenshots, batch processing) matches the CLI commands shown.

✓

Instruction Scope

Runtime instructions are narrowly scoped to installing and running a local CLI (brew install, lit parse, batch-parse, screenshot options, config file). The SKILL.md does not instruct reading unrelated system files, exporting environment variables, or sending data to external endpoints beyond installing the tool via Homebrew.

ℹ

Install Mechanism

The skill is instruction-only (no install spec in registry), but the README tells the user to run 'brew install llamaindex-liteparse' and to install LibreOffice/ImageMagick via Homebrew. Installing via Homebrew is a common pattern, but the registry provides no source or homepage to verify the referenced formula. Because the package name includes 'llamaindex' but the skill claims 'no cloud dependencies or LLM required', you should confirm the Homebrew formula and its upstream repository before installing.

✓

Credentials

No environment variables, credentials, or config paths are requested. The config file shown is local and appropriate for the tool's purpose.

✓

Persistence & Privilege

Skill is not always-included and allows user invocation. There is no instruction to modify other skills or global agent configuration.

Assessment

The skill appears coherent for local document parsing, but it references installing a third‑party Homebrew package ('llamaindex-liteparse') and provides no source or homepage in the registry metadata. Before installing: (1) look up the Homebrew formula and its upstream repository to verify who maintains it, (2) prefer official or well-known package sources, (3) consider installing in an isolated environment (local VM/container) if you’re unsure, and (4) review the formula contents if possible to ensure it doesn't perform unexpected network or system changes. If you can’t verify the package origin, avoid installing it.

Like a lobster shell, security has layers — review code before you run it.

latestvk97667n5jbm5zzrx512f5gk42184ebmd

60downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

LiteParse

Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.

Installation

Already installed via Homebrew:

brew install llamaindex-liteparse

Verify:

lit --version

Supported Formats

Category	Formats
PDF	`.pdf`
Word	`.doc`, `.docx`, `.docm`, `.odt`, `.rtf`
PowerPoint	`.ppt`, `.pptx`, `.pptm`, `.odp`
Spreadsheets	`.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv`
Images	`.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.svg`

Dependencies:

Office documents → LibreOffice (brew install --cask libreoffice)
Images → ImageMagick (brew install imagemagick)

Usage

Parse a Single File

# Basic text extraction
lit parse document.pdf

# JSON output with bounding boxes
lit parse document.pdf --format json -o output.json

# Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"

# Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr

# Higher DPI for better quality
lit parse document.pdf --dpi 300

Batch Parse a Directory

lit batch-parse ./input-directory ./output-directory

# Only PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive

Generate Page Screenshots

# All pages
lit screenshot document.pdf -o ./screenshots

# Specific pages
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots

# High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots

Key Options

Option	Description
`--format json`	Structured JSON with bounding boxes
`--format text`	Plain text (default)
`--target-pages "1-5,10"`	Parse specific pages
`--dpi 300`	Higher rendering quality
`--no-ocr`	Disable OCR (faster for text PDFs)
`--ocr-language fra`	Set OCR language
`-o output.json`	Save to file

Config File

For repeated use, create liteparse.config.json:

{
  "ocrLanguage": "en",
  "ocrEnabled": true,
  "maxPages": 1000,
  "dpi": 150,
  "outputFormat": "json",
  "preciseBoundingBox": true
}

Use with:

lit parse document.pdf --config liteparse.config.json

When to Use

PDF text extraction — fast local parsing
Document conversion — Office docs to text/JSON
Screenshot generation — for LLM visual analysis
Batch processing — multiple files at once
Offline/air-gapped — no cloud required

Comments

Loading comments...