PaddleOCR Text Recognition

PassAudited by ClawScan on May 1, 2026.

Overview

This OCR skill appears coherent and purpose-built, but users should know it sends OCR inputs to a configured remote API and saves raw OCR results locally by default.

This skill is reasonable to install if you trust the configured PaddleOCR API endpoint and are comfortable sending selected images or PDFs there for OCR. Keep the access token private, avoid submitting highly sensitive documents unless appropriate, and use --stdout or remove the temp result file if you do not want OCR output left on disk.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Running the skill may cause uv to fetch the httpx dependency needed for API requests.

Why it was flagged

The script declares an inline dependency that uv will resolve when run. This is normal for the documented workflow, but users are relying on the package source and version resolution.

Skill content
# dependencies = [
#   "httpx>=0.24.0",
# ]
Recommendation

Use uv from a trusted installation and run the skill in an environment where package installation from the configured Python package index is acceptable.

What this means

The skill can make OCR API calls using the configured PaddleOCR token.

Why it was flagged

The skill uses the configured PaddleOCR access token to authenticate API requests, which is expected for the OCR service integration.

Skill content
"Authorization": f"token {token}"
Recommendation

Store the token securely, use the least-privileged or service-specific token available, and rotate it if it is exposed.

What this means

Images or PDFs submitted for OCR may leave the local machine and be processed by the configured remote OCR service.

Why it was flagged

For local files, the script base64-encodes the user-provided file and posts it to the configured OCR API endpoint. This is core to the OCR purpose and is disclosed by the skill's internet/API requirements.

Skill content
params = {"file": _load_file_as_base64(fp)} ... resp = client.post(api_url, json=params, headers=headers)
Recommendation

Only OCR files you are comfortable sending to the configured PaddleOCR endpoint, especially if they contain private, financial, medical, or credential information.

What this means

Extracted text and raw OCR results may remain on disk after the task completes.

Why it was flagged

The skill intentionally persists raw OCR output by default. This is disclosed and useful for downstream parsing, but the saved JSON may contain sensitive recognized text and provider response data.

Skill content
Default behavior: save raw JSON to a temp file ... <system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json
Recommendation

Use --stdout for sensitive one-off OCR jobs or delete the saved temp JSON file when it is no longer needed.