Pdfreader

v1.0.3

Extract text and metadata from PDF files using PyMuPDF, supporting large files and outputting results in JSON format.

2· 600·4 current·4 all-time
byIvan Cetta@nantes
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the files and instructions. The code uses PyMuPDF (fitz) to open PDFs, extract text and metadata, and produce JSON — exactly what the description promises. No extraneous binaries, credentials, or services are requested.
Instruction Scope
SKILL.md usage aligns with the script's behavior (pip install pymupdf; run python pdf_reader.py ...). The SKILL.md states files must be 'within the current working directory' and forbids '../' traversal; the script enforces that by checking absolute paths are inside os.getcwd(). However, the script allows files in subdirectories of the current working directory (contrary to an implication that only the top-level cwd is allowed) and uses os.path.abspath rather than realpath, so a symlink inside the cwd that points outside could bypass the directory restriction. This is an implementation caveat rather than evidence of malicious behavior.
Install Mechanism
No install spec is embedded (instruction-only install guidance in SKILL.md recommends 'pip install pymupdf'). That is low-risk from the skill bundle perspective. Note: installing PyMuPDF via pip will run compiled extension code from PyPI — treat pip installs from unknown sources with standard care.
Credentials
The skill requests no environment variables, credentials, or config paths. The functionality does not require additional secrets. The code does not read environment variables or access unrelated system configuration.
Persistence & Privilege
always is false and the skill does not request persistent/autoincluded privileges. It does not modify other skills or system-wide settings. Autonomous invocation remains the platform default but is not combined with other concerning privileges here.
Assessment
This skill appears to do what it claims: extract text and metadata from PDFs using PyMuPDF. Before installing or running it, consider: 1) Run pip install pymupdf in an isolated environment (virtualenv/container) — PyMuPDF includes compiled code from PyPI. 2) The script enforces 'within current working directory' but allows subdirectories and does not resolve symlinks; avoid placing untrusted symlinks inside the working directory to prevent escapes. 3) Because the source/homepage is unknown, prefer running the script in a sandbox and review the code yourself (or run it on non-sensitive PDFs) before giving it access to important files. If you need stricter confinement (no subdirectories or symlink protections), request a code change to use os.path.realpath checks and a configurable safe directory.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c9f353p46paj3gd08pth5kd81mb48

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments