Skill Audit

Audit and score OpenClaw AgentSkills against structural compliance, quality standards, and OpenClaw-specific architecture patterns. Produces a 0-100 score with A-F grade, dimensional breakdown, and actionable improvement recommendations. Use when asked to audit, score, validate, check, or evaluate a skill or SKILL.md. Keywords: skill audit, skill score, skill check, skill validate, 스킬 검증, 스킬 점수, 스킬 평가, 스킬 감사, 스킬 점검, スキル検証, スキル採点, スキル評価, スキル監査, 技能验证, 技能评分, 技能检查, 技能审计. validate skill, skill quality, 스킬 품질.

haseo-ai@haseo-ai

Install

openclaw skills install @haseo-ai/oc-skill-audit

Skill Audit

Audit and score OpenClaw AgentSkills with a multi-dimensional scoring system.

Quick Start

"audit this skill" or "score path/to/SKILL.md"

Language Policy

Audit reports: Always generate in two languages:
1. User's language — for accessibility (detect from request language or target SKILL.md body language)
2. English — for global shareability and comparability
Both versions should be complete (scores, dimension analysis, improvement recommendations).
Format: write the user-language version first, then the English version, separated by a clear divider.
If the user already requested English, a single English version suffices (no duplication).

Scoring Overview

text

Total (0–100) = Weighted Average

Dimension           Weight  Score Range  Description
─────────────────── ─────── ─────────── ───────────
A. Structure         20%    0–100       Spec compliance, frontmatter, file structure
B. Triggering        15%    0–100       description quality, keywords, triggering
C. Style Guide       20%    0–100       Judgment criteria, preservation/compression rules
D. Workflow          15%    0–100       Phase 0, sub-agent design
E. Sub-Agent Design  15%    0–100       Context injection, templates
F. Conciseness       15%    0–100       SKILL.md length, references separation

Grade: A(90+) | B(75+) | C(60+) | D(45+) | F(<45)

How to Audit

Read target SKILL.md and any references/
Score each dimension using the rubric below
Calculate weighted total and grade
Report score card + improvement recommendations

Dimension A: Structure (20%)

Frontmatter, file structure, agentskills spec compliance.

Check	Points	Criteria
Frontmatter exists	15	`---` opening and closing present
`name` valid	10	lowercase, hyphens only, 1-64 chars
`description` present	15	Non-empty, 1-1024 chars
`description` has keywords	10	Keywords in multiple languages
No auxiliary files	10	No README.md, CHANGELOG.md, etc.
references/ organized	10	If exists, 1-level deep, TOC for >100 lines
SKILL.md under 500 lines	10	Lean body, details in references/
No duplication	5	Info in SKILL.md OR references, not both
Folder name matches name	15	`skill-name/SKILL.md`

Max: 100

Dimension B: Triggering (15%)

Whether the description triggers the agent correctly.

Check	Points	Criteria
"What" described	20	Clear description of what it does
"When" in description	25	Trigger conditions in description (not body)
Keywords (Korean)	15	Korean keywords included
Keywords (English)	15	English keywords included
Not too verbose	10	description ≤ 300 words (metadata cost)
Not too vague	15	Not overly broad (e.g., "helpful skill")

Max: 100

Dimension C: Style Guide (20%)

Whether judgment criteria (preservation/compression) are specified.

Applicability:

Applies: Text processing/transformation/summarization skills (tasks that modify original content)
Partially applies: Code generation/modification skills (may need existing code preservation criteria)
N/A: Structured data management (JSON/DB CRUD), simple tool wrappers, configuration management

For N/A skills, exclude this dimension and redistribute weights among remaining dimensions:

Only C is N/A: A:24%, B:18%, D:18%, E:18%, F:22%
Only E is N/A: A:24%, B:18%, C:24%, D:18%, F:16%
Both C+E are N/A: A:30%, B:20%, D:20%, F:30%

Check	Points	Criteria
Preservation criteria	30	"What to preserve" with specific examples
Compression criteria	25	"What to compress" with specific examples
Task essence defined	20	Core task definition (e.g., "summarization is restructuring, not compression")
Style rules explicit	15	Specific rules for tone, style, length, etc.
Anti-patterns	10	"What not to do" specified (bonus)

Max: 100

Dimension D: Workflow (15%)

Whether Phase 0, execution order, and validation steps exist.

Check	Points	Criteria
Phase 0 (analysis) first	30	Full understanding step before task execution
Clear phase numbering	15	Phase 0, 1, 2... structured
Context injection described	25	Explicit statement that Phase 0 results are injected into sub-agent prompts
Validation step	20	Result quality validation step (length/style/omissions)
Error handling	10	Failure response instructions

Max: 100

Dimension E: Sub-Agent Design (15%)

Sub-agent prompt design quality.

Applicability:

Applies: Skills that use sub-agents (parallel/sequential chunk processing, etc.)
Partially applies: Skills that optionally mention sub-agent delegation (core is single-agent)
N/A: Skills that do not use sub-agents at all

For N/A skills, exclude this dimension and redistribute weights among remaining dimensions:

Only C is N/A: A:24%, B:18%, D:18%, E:18%, F:22%
Only E is N/A: A:24%, B:18%, C:24%, D:18%, F:16%
Both C+E are N/A: A:30%, B:20%, D:20%, F:30%

Check	Points	Criteria
Prompt template exists	20	Template file in references/
`[ ]` placeholders	20	Blank fields filled by Phase 0 explicitly marked
Context fields defined	20	Full context, chunk position, preservation list
Style rules in template	15	3-4 line style instructions included
Merge ≠ concatenate	15	Explicit prohibition of simple concatenation
Sequential pipeline	10	Sequential ordering for multi-stage output

Max: 100

Dimension F: Conciseness (15%)

Context window efficiency.

Check	Points	Criteria
SKILL.md body < 500 lines	40	Length compliance
SKILL.md body < 300 lines	10	Bonus (very concise)
No redundant instructions	20	No duplicate instructions
"Agent is smart" principle	15	Does not redundantly explain what the agent already knows
References linked	15	"When to read" description for each reference file

Max: 100

Output Format

Delivery

Always do both:

File: Save the full audit report (bilingual) to ~/.openclaw/workspace/skill-audit-reports/[skill-name]-audit-[YYYY-MM-DD-HHmm].md. If the same skill is audited again, a new timestamped file is created — never overwrite previous reports. Never save inside the target skill's directory — keep skill folders clean for distribution.
Response: Include a summary card in your reply — total score, grade, top 3 improvement recommendations, and file path. Do NOT paste the full report in the response.

Bilingual Output

Per the Language Policy above, generate two complete versions:

User's language version (first)
English version (second)

Separate them with a clear divider:

text

---
## English Version

If the user requested English, a single English version suffices.

Score Card

markdown

# Skill Audit: [skill-name]

**Version**: skill-audit v1.0
**Date**: YYYY-MM-DD
**Target**: path/to/SKILL.md

## Score: 78/100 (B)

| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| A. Structure | 85 | 20% | 17.0 |
| B. Triggering | 90 | 15% | 13.5 |
| C. Style Guide | 60 | 20% | 12.0 |
| D. Workflow | 70 | 15% | 10.5 |
| E. Sub-Agent Design | 80 | 15% | 12.0 |
| F. Conciseness | 85 | 15% | 12.8 |
| **Total** | | | **77.8** |

## Grade: B

---

## Dimension Details

### A. Structure (85/100)
✅ Frontmatter exists (+15)
✅ name valid: "summarize" (+10)
✅ description present (+15)
...
❌ SKILL.md over 500 lines (-10)
→ Tip: Split detailed specs into references/

### B. Triggering (90/100)
...

## Improvement Recommendations

1. **[High]** Add preservation/compression criteria to Style Guide → C score +20 expected
2. **[Medium]** Specify Phase 0 → D score +15 expected
3. **[Low]** Add references/ TOC → A score +5 expected

Improvement Priority Labels

[Critical] — Skill may not function properly
[High] — Major impact on quality
[Medium] — Meaningful score improvement if addressed
[Low] — Minor improvement

Batch Mode

Audit multiple skills at once:

"Audit all skills in the skills/ folder"

→ Generate individual score cards per skill, then produce a comparison table:

text

| Skill | Total | Grade | A | B | C | D | E | F |
|-------|-------|-------|---|---|---|---|---|---|
| summarize | 82 | B+ | 93 | 88 | 88 | 77 | 78 | 65 |
| changelog | 85 | B+ | 90 | 85 | 90 | 85 | N/A | 80 |
| scaffold | 81 | B | 85 | 80 | 75 | 80 | N/A | 85 |

References

references/scoring-rubric.md — Detailed scoring rubric with examples per dimension
examples/ — Completed audit reports and batch comparisons for validated skills