Pdf2word Skills

v1.0.0

Convert scanned PDF documents into Word text documents using a free, local OCR engine or remote api.

0· 130·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description match the delivered assets: a Python script that calls a local 'docr' binary and uses python-docx to produce a .docx. The included install.sh downloads the expected OCR binary from a GitHub releases URL — this is consistent with providing a local OCR engine.
Instruction Scope
SKILL.md stays on task (install binary, pip deps, run script). It also documents optional use of remote engines (e.g., Gemini) and instructs creating ~/.ocr/config with a gemini_api_key. That config step is outside the skill directory and is not declared in required env/config fields; it's optional but relevant to user privacy and should be noted.
Install Mechanism
The install script downloads a single binary from a GitHub releases URL and writes it under scripts/docr/. Downloading from GitHub releases is a typical, low-risk mechanism compared with arbitrary IPs or paste sites. The script does not extract archives or run additional installers. However, the binary will be executed, so its provenance should be validated.
Credentials
No required environment variables are declared, and the Python script does not read secrets itself. However, SKILL.md asks users to store API keys in ~/.ocr/config for optional remote engines (Gemini). That is reasonable for optional remote OCR but is not declared in requires.env and should be considered a configuration that affects privacy/security for sensitive docs.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and only places the downloaded binary under the skill's scripts directory (and optionally asks the user to create ~/.ocr/config). There is no permanent elevated privilege requested.
Assessment
This skill appears to do what it claims: it downloads a docr binary, runs it on PDFs, and builds a .docx from the extracted text. Before installing or running it: 1) Inspect the referenced GitHub repo/releases (https://github.com/scottkiss/doc-ocr) and verify the release and maintainer match your trust criteria; prefer checking a checksum or signed release if available. 2) Run the install and conversion in a sandbox or VM if you will process sensitive documents, because the downloaded binary is third-party native code and could perform network activity. 3) If you plan to use a remote engine (Gemini), understand that text may leave your machine and follow your organization's data-sharing policies; SKILL.md suggests storing the API key in ~/.ocr/config (this is optional but not declared elsewhere). 4) On Windows there may be an executable extension mismatch (install creates docr.exe but the Python script looks for 'docr'); verify behavior on your platform before automating. 5) If you need stronger assurance, request the upstream source code/binary reproducible build or replace the binary with a vetted OCR implementation.

Like a lobster shell, security has layers — review code before you run it.

latestvk97bjah3he7m1fe8n441rh1rz58354fr

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments