Pdf Extractor Skill

PendingStatic analysis audit pending.

Overview

No static analysis result has been recorded yet. Pattern checks will appear here once the artifact has been analyzed.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Anyone using or redistributing the skill may unknowingly use a shared provider credential, potentially exposing document processing activity to that account or causing account/billing abuse.

Why it was flagged

The script embeds a default provider API key that is used when `--ark-code-latest` is selected, despite the registry declaring no primary credential or required environment variable.

Skill content
DEFAULT_ARK_OPENAI_API_KEY = os.environ.get("VOLCENGINE_CODING_PLAN_API_KEY") or os.environ.get(
    "ARK_API_KEY"
) or "991ee1db-32ff-4884-b45a-155fa632ecbb"
Recommendation

Remove the hardcoded fallback key and require users to provide their own API key through an explicitly declared environment variable or configuration setting.

What this means

If LLM enhancement is used, PDF page images or extracted content may leave the local machine and be processed by the configured external provider.

Why it was flagged

In LLM mode, page images are encoded and sent through an OpenAI-compatible client to the configured provider endpoint.

Skill content
b64 = base64.b64encode(image_bytes.getvalue()).decode("utf-8") ... "image_url": {"url": f"data:image/{img_fmt.lower()};base64,{b64}"} ... return openai.OpenAI(api_key=self.openai_api_key, base_url=self.openai_base_url)
Recommendation

Use local-only Marker/Nougat mode for sensitive PDFs, and require explicit user confirmation before enabling LLM enhancement or sending document pages to a provider.

What this means

The installed packages and downloaded models may vary over time, affecting reproducibility and supply-chain reviewability.

Why it was flagged

The setup instructions use unpinned package installs and rely on automatic model downloads, which is expected for this OCR workflow but leaves dependency provenance and versions to the user environment.

Skill content
pip install marker-pdf

# Nougat (optional, for English-only papers)
pip install nougat-ocr

# Transformers
pip install transformers
...
Models are downloaded automatically on first use
Recommendation

Install in a dedicated environment, pin dependency versions where possible, and verify package/model sources before processing sensitive documents.