PaddleOCR Document Parsing

PassAudited by ClawScan on May 10, 2026.

Overview

The skill appears purpose-aligned for PaddleOCR document parsing, but it uploads user-selected documents to a configured external API and stores raw parsed results on disk by default.

Before installing, confirm that the PaddleOCR API URL is trustworthy, protect the access token, and only process documents you are willing to send to that service. If the documents are sensitive, consider using `--stdout` or a secure output path and delete any temporary result files afterward.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Note

ASI02: Tool Misuse and Exploitation

What this means

A local PDF or image provided to the skill can leave the machine for external OCR processing.

Why it was flagged

For local-file parsing, the script reads the selected file, encodes it, and sends it to the configured document parsing API.

Skill content

return base64.b64encode(path.read_bytes()).decode("utf-8") ... resp = client.post(api_url, json=params, headers=headers)

Recommendation

Use this only with documents you intend to send to PaddleOCR or a trusted configured endpoint, and avoid submitting regulated or confidential files unless that is acceptable.

Note

ASI03: Identity and Privilege Abuse

What this means

Anyone with access to the environment variable could use the PaddleOCR account/API quota associated with the token.

Why it was flagged

The skill uses a user-provided PaddleOCR access token to authenticate API requests, which is expected for this integration.

Skill content

token = _get_env("PADDLEOCR_ACCESS_TOKEN") ... "Authorization": f"token {token}"

Recommendation

Store the token securely, verify the API URL before use, and rotate or revoke the token if it may have been exposed.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Running the commands may fetch Python packages such as httpx, Pillow, or pypdfium2 from package indexes, depending on the script used.

Why it was flagged

The documented workflow depends on uv resolving script dependencies at runtime rather than a locked install specification.

Skill content

No separate install step is needed — uv resolves dependencies automatically

Recommendation

Run in a trusted environment and consider pinning or pre-approving dependencies if your workflow requires strict supply-chain control.

Note

ASI06: Memory and Context Poisoning

What this means

Sensitive extracted document contents may remain on disk after the task completes.

Why it was flagged

The skill persists the full parsed result, including raw JSON and extracted document text, to a local temp path by default.

Skill content

If `--output` is omitted, the script saves automatically under the system temp directory ... `result_<timestamp>_<id>.json`

Recommendation

Use `--stdout` when persistence is not needed, choose a secure `--output` location when saving is needed, and delete temporary result files after use.