LiteParse

v1.0.0

Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W...

⭐ 0· 199·1 current·1 all-time

by@alfred-intel-handler-source

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for alfred-intel-handler-source/liteparse.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "LiteParse" (alfred-intel-handler-source/liteparse) from ClawHub.
Skill page: https://clawhub.ai/alfred-intel-handler-source/liteparse
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install liteparse

ClawHub CLI

Package manager switcher

npx clawhub@latest install liteparse

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

The name/description claim a local CLI-based document parser and the SKILL.md consistently describes using the `lit` CLI (npm package @llamaindex/liteparse) to parse PDFs, Office files, images and produce text/JSON/screenshots. Requiring LibreOffice/ImageMagick for some file types is reasonable. Small inconsistency: SKILL.md and references alternate between “LiteParse” and “LlamaParse/LlamaIndex” branding, and the registry metadata lacks a homepage—this is a minor provenance concern but not a functional mismatch.

ℹ

Instruction Scope

Instructions focus on running the `lit` CLI against user-supplied documents (parse, batch-parse, screenshot). They do not instruct reading unrelated system files or exfiltrating data. The SKILL.md claims "Runs entirely offline — no cloud, no API key," but also documents that Tesseract.js will download ~10MB of language data on first run and that installation uses npm/brew; those steps require network access on first-run/install even though runtime parsing is local afterwards.

ℹ

Install Mechanism

No install spec is embedded in the skill bundle (instruction-only). SKILL.md instructs installing via npm (`npm install -g @llamaindex/liteparse`) or a brew tap. NPM and brew are common but npm global installs can run postinstall scripts and fetch remote artifacts (Tesseract data). There are no direct downloads from obscure URLs in the instructions.

✓

Credentials

The skill declares no required environment variables, credentials, or config paths. The runtime instructions only reference optional external tools (LibreOffice, ImageMagick) and local files provided by the user—this is proportionate to the stated purpose.

✓

Persistence & Privilege

The skill is not forced-always, does not request persistent privileges, and does not propose modifying other skills or global agent settings. It is user-invocable and can be run autonomously by the agent (platform default) which is expected for skills.

Assessment

This skill appears to do what it says: run a local CLI to extract text/screenshots from documents. Before installing: (1) confirm the npm package identity and publisher (search the npm registry and repository) because the registry metadata here lacks a homepage; (2) be aware that the first install/run will fetch packages and Tesseract language data over the network (so it’s not strictly offline until that completes); (3) npm global installs may run install scripts—review the package contents or run in a sandbox/container if you’re unsure; (4) installing LibreOffice/ImageMagick via brew is optional but required for some file types and may require macOS-specific tooling; (5) if provenance is important, ask the publisher for the source repo or checksum and verify the package before global installation. Overall the skill is coherent with its purpose but verify the package origin and consider running in an isolated environment if you have security concerns.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cshpj5gtb3czrvp67rf1xnn83b63n

199downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

LiteParse

Local document parser built on PDF.js + Tesseract.js. Zero cloud dependencies.

Binary: lit (installed globally via npm) Docs: https://developers.llamaindex.ai/liteparse/

Quick Reference

# Parse a PDF to text (stdout)
lit parse document.pdf

# Parse to file
lit parse document.pdf -o output.txt

# Parse to JSON (includes bounding boxes)
lit parse document.pdf --format json -o output.json

# Specific pages only
lit parse document.pdf --target-pages "1-5,10,15-20"

# No OCR (faster, text-layer PDFs only)
lit parse document.pdf --no-ocr

# Batch parse a directory
lit batch-parse ./input-dir ./output-dir

# Screenshot pages (for vision model input)
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots

Output Formats

Format	Use case
`text` (default)	Plain text extraction, feeding into prompts
`json`	Structured output with bounding boxes, useful for layout-aware tasks

OCR Behavior

OCR is on by default via Tesseract.js (downloads ~10MB English data on first run)
First run will be slow; subsequent runs use cached data
--no-ocr for pure text-layer PDFs (faster, no network needed)
For multi-language: --ocr-language fra+eng

Supported File Types

Works natively: PDF

Requires LibreOffice (brew install --cask libreoffice): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv

Requires ImageMagick (brew install imagemagick): .jpg, .png, .gif, .bmp, .tiff, .webp

Installation Notes

Installed via npm: npm install -g @llamaindex/liteparse
Brew formula exists (brew tap run-llama/liteparse) but requires current macOS CLT — use npm as primary install path on this machine
Binary path: /opt/homebrew/bin/lit

Workflow Tips

For VA forms, job description PDFs, military docs: lit parse file.pdf -o /tmp/output.txt then read into context
For scanned PDFs (no text layer): OCR is required; complex layouts may degrade — consider LlamaParse cloud for critical docs
For vision model workflows: use lit screenshot to generate page images, then pass to image tool or similar
For batch jobs: use lit batch-parse — it reuses the PDF engine across files for efficiency

Limitations

Complex tables, multi-column layouts, and scanned government forms may produce imperfect output
LlamaParse (cloud) handles the hard cases: https://cloud.llamaindex.ai
Max recommended DPI for screenshots: 300 (higher = slower, larger files)

Reference

See references/output-examples.md for sample JSON/text output structure.

Comments

Loading comments...