PDF Text Extractor
v1.0.0Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
⭐ 19· 10.3k·123 current·129 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The manifest and SKILL.md claim 'zero dependencies' and OCR via Tesseract.js, but package.json declares pdfjs-dist as a dependency and package-lock.json lists many packages; Tesseract.js is not present. The code dynamically requires 'pdfjs-dist' (it will error if not installed), so the 'zero dependencies' claim is false. OCR is advertised but index.js contains no Tesseract integration or OCR fallback — it always attempts text-layer extraction. These mismatches show the claimed capability does not align with the actual required components.
Instruction Scope
SKILL.md and README instruct using extractText with ocr:true and show examples assuming synchronous or fully-implemented OCR behavior. The runtime index.js does not implement OCR (no Tesseract usage) and also contains API misuse/bugs: extractText calls countWords(fullText) where countWords expects an object {text, options}, which will cause a runtime error; test.js treats extractText's Promise result as synchronous in places. The instructions therefore do not reflect what the code actually does and grant no clear guidance for installing required dependencies.
Install Mechanism
There is no install spec in the registry, but package.json and package-lock.json are included and declare pdfjs-dist and many nested packages. README suggests running 'npm install pdfjs-dist'. The absence of an install spec combined with packaged dependency manifests is inconsistent with the 'zero dependencies' marketing and means a user may need to run npm install (which pulls many packages and optional native build scripts). That increases friction and risk compared to the claimed zero-dependency design.
Credentials
The skill does not request any environment variables, credentials, or config paths. Nothing in the files reads external secrets or unrelated system config.
Persistence & Privilege
Flags show the skill is not always-enabled and uses the default model-invocation behavior. It does not request persistent system-wide privileges or modify other skills' configuration.
What to consider before installing
This package is internally inconsistent: it advertises 'zero dependencies' and OCR support but includes package.json requiring pdfjs-dist and does not implement Tesseract OCR. Before installing or using it, consider: (1) Do not feed sensitive PDFs to an untrusted/unclear package. (2) Inspect package.json/package-lock and run npm install in an isolated sandbox if you want to test; optional native builds (canvas, etc.) may run build scripts. (3) Ask the author to clarify and fix: (a) add or remove OCR support and include Tesseract if intended, (b) correct the countWords API misuse (extractText currently calls countWords incorrectly), and (c) provide a proper install spec or update documentation to match real dependencies. (4) If you need reliable OCR now, use a maintained library with clear dependency docs. If you decide to test this skill, do it in an isolated environment and review the code changes and installed packages first.Like a lobster shell, security has layers — review code before you run it.
latestvk977qn7nntnf8hd7smbhn2yk2n80gbfe
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
