Back to skill
Skillv2.0.16
ClawScan security
PaddleOCR Document Parsing · ClawHub's context-aware review of the artifact, metadata, and declared behavior.
Scanner verdict
BenignApr 3, 2026, 4:26 AM
- Verdict
- benign
- Confidence
- high
- Model
- gpt-5-mini
- Summary
- The skill's files, instructions, and required credentials are coherent with a PaddleOCR document-parsing wrapper: it needs an API URL and access token, uses the uv runner and standard Python libraries, and its behavior matches the description.
- Guidance
- This skill appears to do what it claims, but consider these practical precautions before installing or running it: (1) You must provide a PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN — the skill will upload files (base64 for local file inputs) to that endpoint, so only use tokens/endpoints you trust. (2) By default results are auto-saved under the system temp directory (containing full extracted text and potentially sensitive data); use --stdout or a controlled --output path if you want to avoid leftover temp files. (3) The runner 'uv' will fetch dependencies (httpx, Pillow, pypdfium2) at runtime — standard PyPI supply-chain risks apply. (4) There is a minor metadata mismatch: an optional env var PADDLEOCR_DOC_PARSING_TIMEOUT is used but not declared as required; set it if you need different network timeouts. (5) If you will parse private/local files, be deliberate about using file-path (uploads content) vs file-url (service fetches URL) depending on privacy requirements.
Review Dimensions
- Purpose & Capability
- okName/description, scripts, and declared env vars (PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN) align: the code posts documents to a PaddleOCR layout-parsing endpoint and returns structured JSON/Markdown. Required binary 'uv' is used to run the scripts and is appropriate for this packaging model.
- Instruction Scope
- noteRuntime instructions focus on invoking the included CLI scripts and only reference the declared env vars. The skill saves full raw JSON results to a temp directory by default and instructs the agent to read/return the complete output — this is expected for a parser but means sensitive document contents will be written to disk and may be returned to users. The scripts do not attempt to read unrelated system files or credentials.
- Install Mechanism
- noteThere is no explicit install spec; the skill expects 'uv' to run scripts which will automatically resolve dependencies (httpx, Pillow, pypdfium2) from package sources. This is consistent with the skill's design but implies standard supply-chain risk (dependencies are fetched at run time from registries).
- Credentials
- noteOnly the API URL and access token are required and the primaryEnv is appropriately the access token. One optional env var (PADDLEOCR_DOC_PARSING_TIMEOUT) is referenced in code and documentation but not listed in the required env metadata — minor mismatch but not malicious. No unrelated credentials or broad system secrets are requested.
- Persistence & Privilege
- okThe skill does not request permanent inclusion (always=false) and does not modify other skills or system-wide settings. It writes result JSON into the skill-scoped temp path by default; this is expected but worth noting for privacy reasons.
