Pdfreader

Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.

Install

openclaw skills install pdfreader

PDF Reader Skill for OpenClaw

Extract and read text from PDF files using PyMuPDF.

Installation

pip install pymupdf

Usage

# Extract text (first 10 pages by default)
python pdf_reader.py "path/to/file.pdf" 10

# Output to JSON file (for reading)
python pdf_reader.py "path/to/file.pdf" 10 --output=extracted.json

# Read specific number of pages
python pdf_reader.py "path/to/file.pdf" 5

Features

Extracts text from any PDF
Supports large files
Outputs JSON for AI reading
Handles encoding issues
Shows metadata (title, author, etc.)

Security Restrictions

For safety, the script enforces:

Input files: Must be .pdf files within the current working directory
Output files: Must be .json files within the current working directory
No path traversal (../) allowed
Files can only be read/written in the directory where the script runs

Files

pdf_reader.py - Main Python script
SKILL.md - This documentation