PyMuPDF PDF Parser Clawdbot Skill

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 3 · 4.7k · 32 current installs · 33 all-time installs

by@kesslerio

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description, README, SKILL.md, and the included script all describe and implement fast local PDF parsing to Markdown/JSON/images/tables using PyMuPDF. There are no unrelated binaries, configs, or environment variables required.

✓

Instruction Scope

SKILL.md tells the agent to run the included script on a local PDF and write outputs to a per-document directory. The script reads only the provided PDF and writes output files; it does not reference external endpoints, other config paths, or secret material.

✓

Install Mechanism

No automated install spec is included (instruction-only plus a local script). The only runtime dependency is PyMuPDF (fitz), which the README instructs to install via pip. No downloads from untrusted URLs or archive extraction are present.

✓

Credentials

The skill declares no required environment variables or credentials and the code does not access environment secrets. The dependency on PyMuPDF is appropriate and proportionate to the stated functionality.

✓

Persistence & Privilege

always:false and standard user-invocable/autonomous settings. The skill does not request permanent presence, does not modify other skills' configs, and operates only on files provided at runtime.

Assessment

This skill appears coherent and local-only: it runs a bundled Python script that reads a PDF and writes Markdown/JSON/images to disk and does not contact external services or require credentials. Before installing or running it: (1) install PyMuPDF from a trusted source (pip from PyPI) and inspect the script yourself; (2) run the tool in a sandbox or with unprivileged user access if processing untrusted PDFs—PDF parsing libraries have had vulnerabilities in the past; (3) note the skill's registry metadata lists no homepage/source verification—if you want stronger assurance, fetch the repository linked in the README and verify commit history; (4) if you need robust table/layout extraction consider the larger parser the README references. Overall there are no red flags in the files provided, but treat untrusted PDFs and third-party pip installs with standard caution.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk970tbqmz4hapb0y2xqb5y8dc57zr47c

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

--format md|json|both (default: md)
--images to extract images
--tables to extract a simple line-based table JSON (quick/rough)
--outroot DIR to change output root
--lang adds a language hint into JSON output metadata

Output conventions

Create ./pymupdf-output/<pdf-basename>/ by default.
Markdown output: output.md
JSON output: output.json (includes lang)
Images: images/ subdir
Tables: tables.json (rough line-based)

Notes

PyMuPDF is fast but less robust on complex PDFs.
For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

Files

4 total

Select a file

Select a file to preview.

Comments

Loading comments…