Install
openclaw skills install @songhonglei/skill-deep-auditGeneric skill-quality auditor for any agent skill (Claude, OpenClaw, Cursor, etc.). Runs a 7-dimension static analysis (D1 process closure & idempotency, D2 tool/command conventions, D3 portability & defense, D4 skill usability, D5 security & op risk, D6 code & doc quality, D7 dependency & footprint) with explicit ERR / WARN severity, 115-point scoring (pass line 90 + zero ERR), and an opt-in `--fix` workflow that always backs up first. Two depths: L1 static (~2 min) and L2 dryRun (~5 min, read-only hub + reachability checks). Strict red lines — read-only by default, never executes the audited skill's writes. Use when the user asks to "audit a skill", "check skill quality", "is this skill ready to ship", "lint my skill", or runs this tool by name. Triggers also: "审计这个 Skill"、"检查 Skill 质量"、"Skill 能上线吗"、 "skill-deep-audit"、"审一下 xxx skill"。
openclaw skills install @songhonglei/skill-deep-auditA read-only, multi-dimensional quality auditor for agent skills. Runs static analysis + optional dryRun reachability checks and produces a scorecard.
build-better-skills suite (creation → audit → regression → sediment)Can it run? → D3 Portability + D4 Usability conventions
Does it run correctly? → D1 Process closure + D6 Code & doc quality
Is it safe to run? → D5 Security & op risk
Is it well-conformed? → D2 Tool & command conventions
Is the whole healthy? → D7 Dependency & footprint
--fix is the one exception and requires explicit user authorization).| Dependency | Purpose | Behavior if missing |
|---|---|---|
A skill-hub query tool (e.g. clawhub) | Hub publish-status check (D7-W1) and dependency existence check (D7-W2 step 3) | Skip Hub checks, related items downgrade to WARN, do not abort the audit |
The core of this skill is pure static / read-only analysis. There are no hard external dependencies — L1 static audit works even if no tooling is installed.
At the start of the audit, present the following options and wait for the user to choose explicitly:
Please choose check depth:
L1 Static analysis (~2 min)
File read, structural check, keyword scan, syntax check.
Max 112 (skips items that need to touch external systems). Pass line ≥ 90.
Good for: quick first-draft check.
L2 dryRun (~5 min, recommended) ⭐
L1 + Hub existence check + dependency existence check + branch reachability
simulation (file existence / env config / read-only verification of
unhit branches).
Max 115. Pass line ≥ 90.
Good for: pre-release / pre-ship full acceptance.
Default recommendation: L2 dryRun. Reply 1 / L1 for static, 2 / L2 for dryRun
(empty Enter = L2).
⚠️ Note: L2 dryRun does ONLY read-only queries and reachability checks. It
performs no writes / updates, and it does NOT actually run the audited
skill's business workflow.
The user may provide:
my-skill) → look for a same-named folder under
<skills-dir>/. <skills-dir> is the agent's skills directory and may be
e.g. ~/.claude/skills/, ~/.openclaw/workspace/skills/, or any path the
user specifies. Some agents use a different layout — adjust to what's actually
on disk.skills/my-skill/) → use directly.ls {skill-path}/
cat {skill-path}/SKILL.md | head -20
If the directory cannot be found → tell the user and stop. Do not guess.
Execute each check defined in references/check-rules.md in order.
⚖️ Determinism guarantee: each rule's hit/miss decision uses the grep pattern, keyword list, and numeric thresholds defined for it in check-rules.md — not the agent's subjective judgement. Edge cases not explicitly covered by a rule are handled by the "False-Positive General Rules" section (marked "manual verification needed", not hard-judged). This guarantees stable, repeatable results across different agents / re-runs.
# Script extensions must cover mixed-language skills — .js/.cjs/.mjs/.ts cannot be missed
find {skill-path} -type f \( \
-name "*.py" -o -name "*.sh" -o -name "*.md" -o -name "*.json" -o -name "*.yaml" \
-o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) \
| grep -v __pycache__ | grep -v node_modules | grep -v .git
⚠️ Extension coverage blind-spot:
find -name "*.js"does not match.cjs/.mjs/.ts. Python scripts oftensubprocess-call a siblingxxx.cjs— if the file list misses.cjs, the auditor will wrongly report "called script does not exist" (false positive on D6-E4 / D6-E6). All later extension-scoped scans must include the full set.
Execute in order: D1 → D2 → D3 → D4 → D5 → D6.
ℹ️ D7 is not in this step: D7 (dependencies & footprint) needs the code stats (see 2.4) plus Hub / existence checks, so it is consolidated into Step 4. Step 2 only scans D1–D6.
Execution-level convention:
L1 → runs at all depths.L2 dryRun → runs only at L2 dryRun; for L1, mark as
➖ skipped (L2 dryRun item).{skill-path}/AUDIT-*.md — audit reports are
produced by this tool itself and are not part of the audited skill's package.For each rule:
# Python
for f in $(find {skill-path}/scripts -name "*.py" 2>/dev/null); do
python3 -m py_compile "$f" 2>&1 && echo "OK: $f" || echo "SYNTAX ERR: $f"
done
# Shell
for f in $(find {skill-path}/scripts -name "*.sh" 2>/dev/null); do
bash -n "$f" 2>&1 && echo "OK: $f" || echo "SYNTAX ERR: $f"
done
# Number of script files (covers mixed skills: .js/.cjs/.mjs/.ts)
find {skill-path}/scripts -type f \( -name "*.py" -o -name "*.sh" -o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) 2>/dev/null | grep -v node_modules | wc -l
# Total line count (-r prevents hang on no-match)
find {skill-path} \( -name "*.py" -o -name "*.sh" -o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) | grep -v node_modules | xargs -r wc -l 2>/dev/null | tail -1
# Skill-on-skill dependency: precise extraction (see D7-W2 "three-step join" algorithm)
# ① List all suspicious import candidates (just module names; ownership is resolved later)
grep -rnE "^\s*(from [a-zA-Z_][a-zA-Z0-9_]* import|import [a-zA-Z_][a-zA-Z0-9_]*)" {skill-path}/scripts/ 2>/dev/null
# ① supplementary: look for sys.path injection / skill_root concatenation
# (this is the physical evidence of which skill an import belongs to)
grep -rnE "sys\.path\.insert.*skills/|_skill_root|skills/[a-z-]+/scripts" {skill-path}/scripts/ 2>/dev/null
# ② subprocess calls into other skills' scripts (by path)
grep -rnE "skills/[a-z-]+/scripts|_skill_root.*scripts" {skill-path} 2>/dev/null | grep -v __pycache__
# ③ Explicit declaration in SKILL.md
grep -nE "metadata.*requires|depends on .* skill|requires the .* skill|use .* skill" {skill-path}/SKILL.md 2>/dev/null
# → Agent then deduplicates, applies the three-step join to fix ownership, annotates purpose,
# runs the existence check (D7-W2), and writes the result into report section
# "VI. Skill Dependencies".
# → Stdlib and well-known PyPI packages (os/sys/json/re/requests/openpyxl …) are excluded
# from ownership judgement.
Pre-check: this step requires a skill-hub query tool (e.g.
clawhub). If unavailable → skip Hub checks; mark D7-W1 as "cannot verify (no hub tooling)", downgrade to WARN, do not abort.
name field from frontmatter.Total 115 points
| Dimension | Max |
|---|---|
| D1 Process closure & idempotency | 13 |
| D2 Tool & command conventions | 10 |
| D3 Portability & defense | 15 |
| D4 Skill usability conventions | 21 |
| D5 Security & op risk | 21 |
| D6 Code & doc quality | 31 |
| D7 Dependency & footprint health | 4 |
| Total | 115 |
📊 Scoring convention: ERR is uniformly 3 points (a hit means FAIL; the point value carries no real meaning). WARN uses three priority tiers (high 3 / mid 2 / low 1) — the difference is meant to guide fix order.
Dual-judgement (both conditions must hold for PASS):
Pass line is uniformly 90 at both depths (skipped items don't count toward the actual max but don't change the pass line):
| Depth | Actual max | Pass line |
|---|---|---|
| L1 static | 112 | ≥ 90 |
| L2 dryRun | 115 | ≥ 90 |
| Condition | Result |
|---|---|
| Total ≥ pass line AND zero ERR | ✅ PASS |
| Any ERR, OR total < pass line | ❌ FAIL |
Generate the full report using references/output-template.md.
Write path: {skill-path}/AUDIT-{YYYY-MM-DD}.md
AUDIT-*.mdshould not be packaged with the skill (D4-E5 will detect this).
📋 Audit complete: {skill-name}
─────────────────────────────────────
Total score: {score}/{max} {PASS ✅ / FAIL ❌} (L1 max 112 / L2 dryRun max 115)
Pass line: ≥ 90 (uniform across L1 / L2 dryRun) AND zero ERR (dual-judgement)
Depth: {L1 static / L2 dryRun}
🔴 ERR: {n} | 🟡 WARN: {n}
Highest-priority fix: {ID and name of the highest-deduction ERR}
Estimated score after fixing all ERR: {estimated}/{max}
📁 Scorecard: {skill-path}/AUDIT-{date}.md
🔧 Fix: {N} items auto-fixable / {M} items need human confirmation
Reply "fix" to start auto-fix (the skill folder is backed up first).
--fix) behavior spec⚠️ This is the only step in skill-deep-audit that is allowed to modify the audited skill's files, and only after explicit user authorization. The day-to-day audit (Step 0–7) strictly observes the "audit-only, never fix" red line.
--fix", "fix 5.1", etc.
after the report is delivered.| Sub-section | Type | Auto-fix? |
|---|---|---|
| 5.1 Auto-fixable | Pure text / config / docs (add version, add prerequisites, edit wording, add dependency declaration, normalize reference prefixes — no business logic) | ✅ User says "fix" → batch apply |
| 5.2 Needs human confirmation | Business logic / script code (change control flow, change field matching, change HTTP call, change column mapping, remove over-privileged steps) | ⚠️ Must confirm each item with the user; user approves one → fix one |
Mandatory pre-fix backup:
{skill-path}.bak-{YYYYMMDD-HHMMSS}BACKUP="{skill-path}.bak-$(date +%Y%m%d-%H%M%S)"
cp -r "{skill-path}" "$BACKUP" && echo "✅ Backed up to $BACKUP"
Apply fixes item by item:
✅ Fixed [ID].Do not auto-re-audit:
🔧 Fixed {n} items. Re-run the audit now to verify? (reply "re-audit" to start)Fix record: in the report or reply, list "which files / which items were changed + backup path" so the user can roll back.
This skill is part of the build-better-skills suite — open-source skills that help you build better skills, end-to-end:
| Stage | Skill | Status | What it does |
|---|---|---|---|
| Creation | skill-creator | 🚧 Not yet released | Scaffold a new skill from intent |
| Audit | glic-check | ✅ v1.0.x | Fast, qualitative multi-dimension review (G/L/I/C + U) — run right after any edit |
| Audit | skill-deep-audit | ✅ v1.0.0 | Comprehensive dryRun-level exam — 7 dimensions, 115-pt score, --fix |
| Testing | skill-regression | 🚧 Not yet released | End-to-end regression testing |
| Sediment | skill-sediment | 🚧 Not yet released | Promote successful workflows into new skills |
Two complementary tools share the Audit stage:
glic-check — lightweight, qualitative. Run it right after a change for a
quick multi-dimension sanity review (no score). Best for tight edit loops.skill-deep-audit (this skill) — heavyweight, quantitative. A full
dryRun-level evaluation that grades the skill on a 115-point scale with
ERR/WARN findings and a scorecard. Best as a pre-ship "final exam".Only glic-check and skill-deep-audit ship today. The other entries are
roadmap placeholders — they will appear in the suite repo as they are
open-sourced.