Extract PDF Text

v1.0.2

Extract text from PDF files using PyMuPDF. Parse tables, forms, and complex layouts. Supports OCR for scanned documents.

0· 1.3k·8 current·8 all-time
byIván@ivangdavila
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (extract text, parse tables/forms, support OCR) match the instructions and examples, which use PyMuPDF and optionally pytesseract/Tesseract. Required binary (python3) and pip install guidance for PyMuPDF are appropriate.
Instruction Scope
SKILL.md and included docs only show local operations (opening files, rendering pages, OCR). Examples reference opening user-supplied PDF paths and handling passwords for encrypted PDFs — all within the stated scope. No instructions to read unrelated system files, transmit data externally, or modify other skills.
Install Mechanism
This is instruction-only (no install spec that downloads arbitrary artifacts). The metadata suggests installing PyMuPDF via pip and the docs recommend installing Tesseract from common package managers — standard, low-risk guidance. No obscure URLs or archive extraction are present.
Credentials
The skill requests no environment variables, credentials, or config paths. Examples do show authenticating password-protected PDFs (example passwords are demonstrative only). There are no unrelated credential requests.
Persistence & Privilege
Skill is not always-enabled, is user-invocable, and is instruction-only (no code persisted or installed by the skill). It doesn't request persistent system presence or modify other skills or system-wide settings.
Assessment
This skill is an offline how-to for using PyMuPDF and (optionally) Tesseract OCR — it appears internally consistent. Before using: (1) install Python packages in a virtualenv to avoid system-wide changes; (2) install Tesseract from your OS package manager if you need OCR; (3) review example code if you plan to copy/paste (the examples open files you provide and include illustrative hardcoded passwords — do not ship secrets in code); (4) treat PDFs you process as potentially hostile content (always run on trusted hosts or sandboxes if files come from untrusted sources). If you need confirmation of any hidden behavior, request a version that includes runnable code for review (this skill is instruction-only, so nothing executes automatically).

Like a lobster shell, security has layers — review code before you run it.

latestvk972yzcs7rv5j8cyga2ecn6vzh81edy7

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📄 Clawdis
OSLinux · macOS · Windows
Binspython3

Comments