Pdf Extractor Skill
PassAudited by VirusTotal on May 11, 2026.
Overview
Type: OpenClaw Skill Name: pdf-extractor-skill Version: 1.0.0 The skill bundle contains a hardcoded API key (991ee1db-32ff-4884-b45a-155fa632ecbb) and hardcoded absolute Windows file paths (e.g., C:\Users\cr\...) within scripts/pdf2md_marker.py and SKILL.md. It also includes functionality to exfiltrate document content to an external third-party LLM endpoint (ark.cn-beijing.volces.com) for processing. While these behaviors are aligned with the stated purpose of high-quality PDF OCR, the inclusion of hardcoded credentials and specific third-party routing are high-risk patterns that warrant caution.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Anyone using or redistributing the skill may unknowingly use a shared provider credential, potentially exposing document processing activity to that account or causing account/billing abuse.
The script embeds a default provider API key that is used when `--ark-code-latest` is selected, despite the registry declaring no primary credential or required environment variable.
DEFAULT_ARK_OPENAI_API_KEY = os.environ.get("VOLCENGINE_CODING_PLAN_API_KEY") or os.environ.get(
"ARK_API_KEY"
) or "991ee1db-32ff-4884-b45a-155fa632ecbb"Remove the hardcoded fallback key and require users to provide their own API key through an explicitly declared environment variable or configuration setting.
If LLM enhancement is used, PDF page images or extracted content may leave the local machine and be processed by the configured external provider.
In LLM mode, page images are encoded and sent through an OpenAI-compatible client to the configured provider endpoint.
b64 = base64.b64encode(image_bytes.getvalue()).decode("utf-8") ... "image_url": {"url": f"data:image/{img_fmt.lower()};base64,{b64}"} ... return openai.OpenAI(api_key=self.openai_api_key, base_url=self.openai_base_url)Use local-only Marker/Nougat mode for sensitive PDFs, and require explicit user confirmation before enabling LLM enhancement or sending document pages to a provider.
The installed packages and downloaded models may vary over time, affecting reproducibility and supply-chain reviewability.
The setup instructions use unpinned package installs and rely on automatic model downloads, which is expected for this OCR workflow but leaves dependency provenance and versions to the user environment.
pip install marker-pdf # Nougat (optional, for English-only papers) pip install nougat-ocr # Transformers pip install transformers ... Models are downloaded automatically on first use
Install in a dedicated environment, pin dependency versions where possible, and verify package/model sources before processing sensitive documents.
