PaddleOCR Document Parsing
PassAudited by ClawScan on May 10, 2026.
Overview
The skill appears purpose-aligned for PaddleOCR document parsing, but it uploads user-selected documents to a configured external API and stores raw parsed results on disk by default.
Before installing, confirm that the PaddleOCR API URL is trustworthy, protect the access token, and only process documents you are willing to send to that service. If the documents are sensitive, consider using `--stdout` or a secure output path and delete any temporary result files afterward.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A local PDF or image provided to the skill can leave the machine for external OCR processing.
For local-file parsing, the script reads the selected file, encodes it, and sends it to the configured document parsing API.
return base64.b64encode(path.read_bytes()).decode("utf-8") ... resp = client.post(api_url, json=params, headers=headers)Use this only with documents you intend to send to PaddleOCR or a trusted configured endpoint, and avoid submitting regulated or confidential files unless that is acceptable.
Anyone with access to the environment variable could use the PaddleOCR account/API quota associated with the token.
The skill uses a user-provided PaddleOCR access token to authenticate API requests, which is expected for this integration.
token = _get_env("PADDLEOCR_ACCESS_TOKEN") ... "Authorization": f"token {token}"Store the token securely, verify the API URL before use, and rotate or revoke the token if it may have been exposed.
Running the commands may fetch Python packages such as httpx, Pillow, or pypdfium2 from package indexes, depending on the script used.
The documented workflow depends on uv resolving script dependencies at runtime rather than a locked install specification.
No separate install step is needed — uv resolves dependencies automatically
Run in a trusted environment and consider pinning or pre-approving dependencies if your workflow requires strict supply-chain control.
Sensitive extracted document contents may remain on disk after the task completes.
The skill persists the full parsed result, including raw JSON and extracted document text, to a local temp path by default.
If `--output` is omitted, the script saves automatically under the system temp directory ... `result_<timestamp>_<id>.json`
Use `--stdout` when persistence is not needed, choose a secure `--output` location when saving is needed, and delete temporary result files after use.
