PDF Text Extractor

v1.0.0

Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.

19· 10.3k·123 current·129 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The manifest and SKILL.md claim 'zero dependencies' and OCR via Tesseract.js, but package.json declares pdfjs-dist as a dependency and package-lock.json lists many packages; Tesseract.js is not present. The code dynamically requires 'pdfjs-dist' (it will error if not installed), so the 'zero dependencies' claim is false. OCR is advertised but index.js contains no Tesseract integration or OCR fallback — it always attempts text-layer extraction. These mismatches show the claimed capability does not align with the actual required components.
!
Instruction Scope
SKILL.md and README instruct using extractText with ocr:true and show examples assuming synchronous or fully-implemented OCR behavior. The runtime index.js does not implement OCR (no Tesseract usage) and also contains API misuse/bugs: extractText calls countWords(fullText) where countWords expects an object {text, options}, which will cause a runtime error; test.js treats extractText's Promise result as synchronous in places. The instructions therefore do not reflect what the code actually does and grant no clear guidance for installing required dependencies.
!
Install Mechanism
There is no install spec in the registry, but package.json and package-lock.json are included and declare pdfjs-dist and many nested packages. README suggests running 'npm install pdfjs-dist'. The absence of an install spec combined with packaged dependency manifests is inconsistent with the 'zero dependencies' marketing and means a user may need to run npm install (which pulls many packages and optional native build scripts). That increases friction and risk compared to the claimed zero-dependency design.
Credentials
The skill does not request any environment variables, credentials, or config paths. Nothing in the files reads external secrets or unrelated system config.
Persistence & Privilege
Flags show the skill is not always-enabled and uses the default model-invocation behavior. It does not request persistent system-wide privileges or modify other skills' configuration.
What to consider before installing
This package is internally inconsistent: it advertises 'zero dependencies' and OCR support but includes package.json requiring pdfjs-dist and does not implement Tesseract OCR. Before installing or using it, consider: (1) Do not feed sensitive PDFs to an untrusted/unclear package. (2) Inspect package.json/package-lock and run npm install in an isolated sandbox if you want to test; optional native builds (canvas, etc.) may run build scripts. (3) Ask the author to clarify and fix: (a) add or remove OCR support and include Tesseract if intended, (b) correct the countWords API misuse (extractText currently calls countWords incorrectly), and (c) provide a proper install spec or update documentation to match real dependencies. (4) If you need reliable OCR now, use a maintained library with clear dependency docs. If you decide to test this skill, do it in an isolated environment and review the code changes and installed packages first.

Like a lobster shell, security has layers — review code before you run it.

latestvk977qn7nntnf8hd7smbhn2yk2n80gbfe

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments