Extract PDF Text
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: extract-pdf-text Version: 1.0.2 The OpenClaw AgentSkills bundle 'extract-pdf-text' is benign. All files (`_meta.json`, `SKILL.md`, `examples.md`, `ocr.md`, `troubleshooting.md`) consistently provide instructions and code examples for local PDF text extraction using PyMuPDF and Tesseract OCR. The `SKILL.md` explicitly states that the skill 'NEVER accesses files without user request, sends data externally, or modifies original PDFs', and the provided code adheres to this. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, obfuscation, or prompt injection attempts against the agent. All operations are local and aligned with the stated purpose.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing the package may modify your Python environment and relies on the integrity of the package source you use.
The skill directs users to install an unpinned third-party Python package. This is central to the PDF extraction purpose, but it changes the local Python environment and depends on package-source trust.
pip install PyMuPDF
Install in a virtual environment from a trusted package index, and pin or review versions if you need a controlled environment.
If you run the OCR setup command, it can change system packages on your machine.
Optional OCR setup includes system package installation, which may require administrator privileges. It is directly related to OCR support and is presented as user-run setup, not automatic execution.
sudo apt install tesseract-ocr
Run the OCR installation only if needed, use trusted OS repositories or package managers, and confirm the command before granting administrator privileges.
