Back to skill
Skillv1.1.1

ClawScan security

PaddleOCR · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

BenignApr 15, 2026, 10:03 AM
Verdict
benign
Confidence
high
Model
gpt-5-mini
Summary
The skill's code and instructions match its stated purpose (legal-document OCR → Markdown + archive); main issues are a registry metadata mismatch about required env vars and the privacy risk that documents are uploaded to a configured external PaddleOCR endpoint.
Guidance
This skill appears to do what it says: it converts PDFs/images to Markdown and keeps a local archive, calling a PaddleOCR layout-parsing API that you must configure. Before using it with sensitive documents: 1) Fix the registry/config mismatch — SKILL.md and lib.py require PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN (create paddle-ocr/config/.env as described). 2) Verify the API URL points to a service you trust (self-hosted or a trusted provider). The tool will upload full documents (Base64) to that endpoint with the token in an Authorization header. 3) Run smoke_test.py first (try --skip-api-test to check config, then with a non-sensitive sample to confirm endpoint behavior). 4) If privacy is critical, prefer a local/self-hosted PaddleOCR layout-parsing endpoint (or localhost) so data does not leave your environment. 5) Be aware 'uv' will install Python dependencies at runtime (via PyPI); inspect dependencies if your environment has strict policies. Finally, review the repo/homepage and .env.example to confirm the configured endpoint and token handling meet your security/compliance needs.

Review Dimensions

Purpose & Capability
noteThe skill is advertised as a PaddleOCR-based legal-PDF-to-Markdown converter and the included scripts (convert.py, layout_caller.py, lib.py, etc.) implement that. The environment variables and request/response handling in lib.py align with the stated purpose (PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN). However, the registry metadata claims 'Required env vars: none' while SKILL.md and lib.py clearly require the two PaddleOCR config variables — an inconsistency that can mislead users during installation/configuration.
Instruction Scope
noteRuntime instructions are scoped to: reading a local file or remote URL, calling the configured PaddleOCR layout-parsing API (sending file Base64 or file URL), extracting Markdown and images, and writing an archive under the skill's archive/ folder. This is coherent with the purpose. Important privacy/security implication: the scripts will upload entire documents (Base64 payloads) to whatever API URL you configure, so sensitive legal or medical documents will be transmitted to that endpoint. The scripts do not attempt to read unrelated system files or other credentials.
Install Mechanism
okThere is no registry install spec (instruction-only), and the repo provides Python scripts. Execution uses 'uv run --script' headers to declare dependencies; the SKILL.md asks the user to install 'uv' if needed. No arbitrary network downloads or packaged installers are defined by the registry. Note: running the scripts will install Python packages (via uv/PyPI) at runtime — standard but worth being aware of.
Credentials
concernThe skill legitimately requires two environment/config values: PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN (declared in SKILL.md and enforced by lib.py). These are proportional to the functionality. The concern is the mismatch with the registry metadata that listed no required env vars — that omission could lead users to run the skill without realizing they must configure an API endpoint and token (and thus inadvertently send data to an unintended endpoint). The skill does not request unrelated credentials or broad environment access.
Persistence & Privilege
okThe skill does not request 'always: true' or other elevated platform privileges. It writes files only to an archive directory under its own skill root and to user-specified output paths. It does not modify other skills or global agent configuration.