Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Regulation Extractor

v3.0.0

从建筑工程规范PDF中结构化提取条文并同步到飞书多维表格。支持PDF双文字层（原文+OCR）去重、纯图片PDF的RapidOCR识别、条文编号切分（含带空格编号如6. 1. 2. 3）、带圈数字转换（如6.4.④→6.4.4）、OCR错误检测、质量标记、文本清洗（去换行/页眉/符号表/中英文粘连/过长切分）。输出...

⭐ 0· 60·0 current·0 all-time

by@youfeijun123

MIT-0

Download zip

LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description align with the included scripts: extract_regulation.py, ocr_batch.py, deep_clean.py, clean_json.py, quality_check.py, and sync_to_bitable.py implement PDF text extraction, offline RapidOCR, cleaning, quality checks, and Feishu (飞书) sync. External network use is limited to Feishu API for its intended purpose.

Instruction Scope

SKILL.md instructs running each script with user-specified paths, but scripts are not fully consistent with that expectation: quality_check.py ignores CLI input and uses a hard-coded Windows path (output_dir = Path(r"D:\有斐家\小一\常用规范处理成果")), which could read arbitrary JSON files on the host if executed. Other scripts read PDFs and write JSON (expected). The sync script performs network writes to Feishu only when given credentials/IDs.

ℹ

Install Mechanism

No automated install spec included (instruction-only), but SKILL.md lists pip deps (PyMuPDF, rapidocr-onnxruntime). That requires installing Python packages manually in the runtime; this is moderate risk but typical. There is no download from untrusted URLs or archive extraction in the skill bundle itself.

Credentials

Feishu credentials (app_id, app_secret, app_token, table_id) are required only for the sync_to_bitable step and are proportional to the stated purpose. However the skill metadata did not declare required credentials or env vars; credentials are passed as CLI args. The hard-coded path in quality_check.py can access local files unexpectedly, which is disproportionate to the stated single-file quality-check invocation.

✓

Persistence & Privilege

The skill does not request persistent installation privileges (always=false), does not modify other skills or system-wide configs, and will not autonomously exfiltrate data except when the user runs the sync script with Feishu credentials. No evidence of attempts to persist credentials or enable background network activity.

Scan Findings in Context

[HARD_CODED_PATH_QUALITY_CHECK] unexpected: quality_check.py defines a hard-coded output_dir (D:\... ) and does not accept an input path; this contradicts SKILL.md's usage and risks reading unrelated host files.

[EXTERNAL_NETWORK_CALL_FEISHU] expected: sync_to_bitable.py calls Feishu open-apis to obtain a tenant token and to create/list/delete records; this is expected for a feature that writes to Feishu tables and requires app credentials.

[LOCAL_OCR_ONNX] expected: The scripts use rapidocr_onnxruntime (RapidOCR) for offline OCR processing, which is consistent with the description claiming RapidOCR (ONNX) and does not imply remote OCR.

What to consider before installing

Before installing or running this skill: 1) Inspect and edit quality_check.py to remove or change the hard-coded Path to a directory you control (it currently points to a developer's D: directory). Do not run that script until you confirm the path or provide a safe working directory. 2) When using sync_to_bitable.py, prefer the --dry_run option first to preview changes; create a least-privileged Feishu app/service account for writes and rotate credentials after use. 3) Install and run dependencies (PyMuPDF, rapidocr-onnxruntime) in an isolated environment (virtualenv or container) to avoid system-wide changes. 4) Review any JSON outputs before syncing (they contain extracted regulatory text). 5) If you need to run these scripts on a multi-tenant or sensitive host, run them in an isolated VM/container and verify network access rules so that only the intended Feishu endpoint is reachable. 6) Overall risk: functionality appears legitimate, but the hard-coded path and undocumented credential handling are concrete issues to fix before trusting the package.

Like a lobster shell, security has layers — review code before you run it.

latestvk9771qnzdfney8hnjn0fcfan9583sgnv

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Regulation Extractor

License

Comments