Pdfreader

v1.0.3

Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.

2· 641· 4 versions· 4 current· 4 all-time· Updated 2mo ago· MIT-0
byIvan Cetta@nantes

PDF Reader Skill for OpenClaw

Extract and read text from PDF files using PyMuPDF.

Installation

pip install pymupdf

Usage

# Extract text (first 10 pages by default)
python pdf_reader.py "path/to/file.pdf" 10

# Output to JSON file (for reading)
python pdf_reader.py "path/to/file.pdf" 10 --output=extracted.json

# Read specific number of pages
python pdf_reader.py "path/to/file.pdf" 5

Features

  • Extracts text from any PDF
  • Supports large files
  • Outputs JSON for AI reading
  • Handles encoding issues
  • Shows metadata (title, author, etc.)

Security Restrictions

For safety, the script enforces:

  • Input files: Must be .pdf files within the current working directory
  • Output files: Must be .json files within the current working directory
  • No path traversal (../) allowed
  • Files can only be read/written in the directory where the script runs

Files

  • pdf_reader.py - Main Python script
  • SKILL.md - This documentation

Version tags

latestvk97c9f353p46paj3gd08pth5kd81mb48