Pdf2word Skills
Security checks across malware telemetry and agentic risk
Overview
This PDF-to-Word skill is generally coherent and purpose-aligned, but users should notice that it downloads and runs an external OCR binary and can optionally send documents to Gemini if configured.
This skill looks purpose-aligned for converting scanned PDFs to Word files. Before installing, be comfortable running a downloaded OCR binary from GitHub, and use the local OCR engine for private documents unless you intentionally want Gemini to process the file.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing the skill requires trusting the external OCR binary downloaded from GitHub.
The installer downloads a platform-specific OCR executable from a GitHub release and marks it executable, but the artifact does not show checksum or signature verification.
DOWNLOAD_URL="https://github.com/scottkiss/doc-ocr/releases/download/$VERSION/$FILENAME" ... curl -L -o "$TARGET_FILE" "$DOWNLOAD_URL" ... chmod +x "$TARGET_FILE"
Only run the installer if you trust the doc-ocr release source; consider verifying the release checksum or building the OCR tool from a trusted source.
If Gemini mode is used, the OCR engine can use the configured API key to access the user’s Gemini account for OCR requests.
The optional Gemini workflow asks the user to store an API key in a local OCR config file, while registry metadata declares no primary credential or required environment variables.
echo "gemini_api_key=your_gemini_key" > ~/.ocr/config
Use a minimally scoped API key if possible, keep the config file private, and avoid enabling Gemini mode unless remote OCR is intended.
Sensitive PDF contents could be sent to a remote provider if the user chooses Gemini OCR.
The skill discloses an optional remote OCR provider path. When selected, document content may be processed by Gemini rather than only locally.
The underlying `docr` tool also supports other engines like the Google Gemini API ... python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini
Use the default local RapidOCR engine for confidential documents, or review Gemini’s data handling terms before using remote OCR.
