PDF Utils
v1.0.1PDF Utils enables OCR of image-based PDFs, extraction of arXiv IDs from text or OCR output, and scriptable PDF tasks like merging, splitting, and rendering.
⭐ 0· 105·0 current·0 all-time
byLu Wang@wangwllu
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The name/description (OCR, arXiv extraction, merge/split/render) matches the provided scripts and docs. The code only requires PyMuPDF, pytesseract, Pillow and the tesseract binary (all relevant to OCR and PDF processing). No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
SKILL.md and the scripts are focused on local PDF processing. The scripts read PDFs, optionally OCR pages, extract arXiv identifiers, and (optionally) download PDFs from arxiv.org. They do not read arbitrary system credentials or other unrelated filesystem locations. Note: some scripts invoke subprocesses (curl for downloads and tesseract --list-langs) and will perform network downloads when the --download flag is used, which is consistent with the documented behavior.
Install Mechanism
This is an instruction-only skill (no install spec). SKILL.md recommends installing tesseract via brew and Python packages via pip. That is expected for OCR functionality but requires the user to run external installers (brew/pip) and to install tesseract language packs; ensure you run these from trusted package sources. No archive downloads or arbitrary URLs are used by an install step.
Credentials
The skill declares no required environment variables or credentials. The code does not attempt to access secrets or unrelated environment variables. Network access is used only to fetch papers from arxiv.org when the download option is selected.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or global agent configuration. It runs as user-invocable code and will only create files/directories where the CLI is instructed to (e.g., output dir for downloads or OCR text).
Assessment
This skill appears coherent and implements what it claims. Before installing/using: (1) review and run the scripts on unprivileged/sample PDFs to confirm behavior; (2) be aware OCR requires installing the tesseract binary and language packs (SKILL.md suggests brew); (3) the extract_refs download option uses curl to fetch PDFs from arxiv.org — only enable downloads when you want network activity and ensure you trust the source; (4) the scripts write output files (papers/, temporary PNGs, OCR text) in locations you specify — run them in directories you control; (5) if you need higher assurance, inspect or run the included tests and review the small subprocess calls (curl, tesseract) which are expected for this functionality.Like a lobster shell, security has layers — review code before you run it.
latestvk97fbcrncvw2mpy96619ccr8vs83ceas
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
