Skillv1.0.0

ClawScan security

Shwuyeyanjiu · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

BenignApr 7, 2026, 5:16 PM

Verdict: Benign
Confidence: high
Model: gpt-5-mini
Summary: The skill's code and runtime instructions are consistent with its stated purpose (scraping Shanghai property tender notices, OCRing PDFs, and calculating revenue), with no unexpected credential requests or hidden endpoints — but it writes files to disk and can download many PDFs, so review/limit runs and prerequisites first.
Guidance: This skill appears coherent with its description — it scrapes the listed Shanghai government site, downloads PDFs, and OCRs them locally. Before installing or running: 1) Review and (if needed) change hardcoded file paths (scripts write to /tmp and a hardcoded /Users/… path) to avoid unexpected writes. 2) Ensure you have poppler and tesseract installed locally (pdf2image/pytesseract require native dependencies). 3) Be prepared for bulk downloads and CPU-heavy OCR when running batch scripts; limit page ranges or test with a single project first. 4) Confirm scraping the target site complies with its terms of use. 5) Inspect the included scripts yourself (they are plain Python) if you have low trust, and run them in an isolated environment (container/VM) if you want to limit side effects.

Review Dimensions

Purpose & Capability: okName/description match what the code and SKILL.md do: both scrape the listed Shanghai government site (962121.fgj.sh.gov.cn), download PDFs, run OCR and extract numeric fields to compute 'saturation' revenue. Declared Python dependencies (requests, beautifulsoup4, pdf2image, pytesseract, python-dateutil) are appropriate for that purpose; no unrelated credentials or binaries are requested.
Instruction Scope: noteSKILL.md and the scripts instruct the agent to crawl the target site, download PDF files, run OCR (pdf2image + pytesseract), and parse results. This stays within the stated scope. Items worth noting: (1) the workflow can download and OCR many PDFs (bulk network + CPU/io); (2) scripts use /tmp for downloads and at least one script writes to a hardcoded user path (/Users/yujunwang/.openclaw/workspace/...), which is unexpected and should be adjusted before running; (3) SKILL.md expects system dependency poppler (not installed by the skill) and OCR accuracy requires manual verification. No instructions request unrelated local files or environment secrets.
Install Mechanism: okThere is no install spec — it's instruction- and script-based. The package.json lists reasonable Python package dependencies; no remote arbitrary downloads or URL-based installers are invoked by the skill itself. The only higher-risk external dependency is the system package 'poppler' (required by pdf2image), which SKILL.md documents but is a normal native dependency for OCR.
Credentials: okThe skill requires no environment variables, secrets, or credentials. It performs network requests to the stated public government site only. The only surprising environment-like behavior is file writes to /tmp and an unexpected hardcoded path under a specific user home; these grant the skill filesystem persistence of downloaded PDFs and CSV outputs but do not involve unrelated credentials.
Persistence & Privilege: notealways:false and no special privileges requested. The skill writes files (PDFs, CSVs) to disk (mostly /tmp; some scripts use a hardcoded home path). That is normal for this workload but means it will leave downloaded files and OCR outputs on the host if executed. There is no modification of other skills or system-wide configs.