Pdfreader
v1.0.3Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.
PDF Reader Skill for OpenClaw
Extract and read text from PDF files using PyMuPDF.
Installation
pip install pymupdf
Usage
# Extract text (first 10 pages by default)
python pdf_reader.py "path/to/file.pdf" 10
# Output to JSON file (for reading)
python pdf_reader.py "path/to/file.pdf" 10 --output=extracted.json
# Read specific number of pages
python pdf_reader.py "path/to/file.pdf" 5
Features
- Extracts text from any PDF
- Supports large files
- Outputs JSON for AI reading
- Handles encoding issues
- Shows metadata (title, author, etc.)
Security Restrictions
For safety, the script enforces:
- Input files: Must be
.pdffiles within the current working directory - Output files: Must be
.jsonfiles within the current working directory - No path traversal (
../) allowed - Files can only be read/written in the directory where the script runs
Files
pdf_reader.py- Main Python scriptSKILL.md- This documentation
Version tags
latest
