Skill Quality Check

v1.0.4

Quality audit for AI Agent Skills. Use before installing or after writing any SKILL.md. Scores 5 dimensions with actionable improvements. Works for skills wr...

⭐ 1· 135·0 current·0 all-time

byDenny@webkong

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for webkong/skill-quality-check.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Skill Quality Check" (webkong/skill-quality-check) from ClawHub.
Skill page: https://clawhub.ai/webkong/skill-quality-check
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install skill-quality-check

ClawHub CLI

Package manager switcher

npx clawhub@latest install skill-quality-check

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name, description, and included files (SKILL.md, README, examples, references) align with a Skill-audit tool. There are no binaries, env vars, or config paths required that would be unrelated to auditing SKILL.md files.

ℹ

Instruction Scope

Instructions explicitly tell the agent to read local SKILL.md files (common skill dirs are listed) and optionally fetch a remote SKILL.md via raw.githubusercontent.com using curl if no local copy exists — this is coherent for an audit tool. Note: the curl step implies network fetch of arbitrary public repo content; that is expected for remote audits but is a capability the user should be aware of.

✓

Install Mechanism

No install spec or code files; instruction-only skills have low disk/write risk. Everything is documentation and examples; nothing will be downloaded or executed by default outside of optional user-invoked network fetch guidance.

✓

Credentials

The skill requests no environment variables, no credentials, and no config paths. That is proportional for a documentation/audit skill.

✓

Persistence & Privilege

always:false and no instructions to persist configuration, modify other skills, or change system-wide settings. The skill does not ask for elevated or permanent presence.

Assessment

This appears to be a coherent, documentation-only audit tool. Before using: (1) be aware the SKILL.md suggests fetching remote files with curl — only fetch SKILL.md from repositories you trust; network fetches can retrieve arbitrary content. (2) The audit reads SKILL.md and adjacent reference files in common skill directories (e.g., ~/.openclaw, ~/.claude); run it only if you consent to that file access. (3) Review the bundled references/examples yourself — the auditor's recommendations are only as good as its rulebook. If you plan to run automated CI checks, review and pin any scripts you add separately rather than relying solely on remote fetches.

Like a lobster shell, security has layers — review code before you run it.

auditvk977qny54m2c6sertgyvb1hj1183tezhbest-practicesvk977qny54m2c6sertgyvb1hj1183tezhclaudevk977qny54m2c6sertgyvb1hj1183tezhcodexvk977qny54m2c6sertgyvb1hj1183tezhcursorvk977qny54m2c6sertgyvb1hj1183tezhframeworkvk977qny54m2c6sertgyvb1hj1183tezhlatestvk977qny54m2c6sertgyvb1hj1183tezhopenclawvk977qny54m2c6sertgyvb1hj1183tezhskill-qualityvk977qny54m2c6sertgyvb1hj1183tezh

135downloads

1stars

5versions

Updated 4w ago

v1.0.4

MIT-0

Skill Quality Check 🔍

Universal quality assessment framework for AI Agent Skills. Evaluates any SKILL.md file across 5 dimensions, outputting a quantified score and actionable improvement suggestions. Designed to work with skills built for Claude, Cursor, Codex, OpenClaw, or any AI agent.

When to Use

Before installing a new Skill from any source
After writing your own Skill (self-check)
Comparing quality of similar Skills
Evaluating Skills for ClawHub/SkillHub submission
As companion to Skill Creator — learn to write, then learn to audit

Audit Protocol

Step 1: Locate and Read the Target Skill

Find the SKILL.md file:

# Path priority (in order):
1. User-specified path
2. <skills-dir>/<skill-name>/SKILL.md

   # Common locations by platform:
   #   OpenClaw:     ~/.openclaw/skills/<skill-name>/SKILL.md
   #   QClaw:        ~/.qclaw/skills/<skill-name>/SKILL.md
   #   Claude Code:  ~/.claude/skills/<skill-name>/SKILL.md
   #   Cursor:       ~/.cursor/skills/<skill-name>/SKILL.md
   #   Codex:        ~/.codex/skills/<skill-name>/SKILL.md
3. <repo>/skills/<skill-name>/SKILL.md
4. <repo>/<skill-name>/SKILL.md

# If installing from GitHub without a local copy, fetch via curl:
curl -s "https://raw.githubusercontent.com/<owner>/<repo>/main/skills/<skill>/SKILL.md"

Then scan the directory for supporting files:

skill-name/
├── SKILL.md       ✅ required
├── scripts/       ✅ optional (lazy-loaded)
├── references/   ✅ optional (lazy-loaded)
└── assets/        ✅ optional (lazy-loaded)

Step 2: YAML Frontmatter Review

SKILL.md must have YAML frontmatter with only these fields:

---
name: <skill-name>      ✅ required
description: >          ✅ required
# Fields below are NOT recommended in frontmatter:
# ❌ version           → package metadata
# ❌ author            → non-standard
# ❌ license           → non-essential
# ❌ compatibility     → most Skills don't need it
# ❌ tags              → non-standard
---

Review checklist:

Does name and description exist?
Is description under 150 characters (trigger-level content must be concise)?
Does description include trigger keywords ("when to use")?
Are there extra fields wasting Level 1 tokens?

Step 3: Description Quality Assessment

Description is Level 1 content — the AI uses it to decide whether to trigger the Skill. It is a trigger, not a manual.

✅ Good Description:

TDD test-driven development workflow. Use when writing new features,
adding tests, or debugging. Keywords: test-driven, TDD, red-green-refactor.

❌ Bad Description:

This is a comprehensive guide to Test-Driven Development using the
red-green-refactor cycle. First, write a failing test that describes
the behavior you want. Then write the minimum code to make it pass...

(Too long — contains Level 2 content that belongs in SKILL.md body)

Scoring rubric (each dimension 0-10):

#	Dimension	Question
1	Trigger Accuracy	Does it clearly state when to use this Skill?
2	Conciseness	Under 150 chars? No explanatory filler?
3	Keyword Coverage	Does it include trigger keywords (e.g. TDD, debug, pdf)?
4	Non-Redundancy	Does it avoid restating what AI already knows?

Step 4: SKILL.md Body Quality Assessment

Five assessment dimensions (0-10 each):

4.1 Progressive Disclosure

Does it follow the three-layer loading principle?

Layer	Content	When Loaded
Level 1	name + description	Always in context
Level 2	SKILL.md body	On skill trigger
Level 3	scripts/ + references/ + assets/	On execution, never in context

Review checklist:

Trigger conditions → should be in Description (Level 1)
Execution steps, tool instructions → SKILL.md body (Level 2)
Detailed docs, scripts, templates → references/scripts (Level 3)
SKILL.md body under 500 lines?

4.2 Role Setting

Does the Skill open with a clear role or context definition?

✅ Good example:

# PDF Processing Skill

You are a professional document preparation assistant specializing in
PDF creation and editing workflows...

4.3 Examples

Are there sufficient, relevant, and diverse examples?

Claude recommends 3-5 examples that are:

Relevant: tied to real use cases
Diverse: cover edge cases
Structured: wrapped in XML tags

Review checklist:

Input/output example pairs present?
Core use cases covered?
Edge cases shown?

4.4 Instruction Clarity

Are instructions clear, actionable, and unambiguous?

Review checklist:

Steps listed with numbered lists?
Conditional branches explained?
Error/exception handling covered?
Output format specified (e.g. JSON structure)?

Step 5: Resource Layer Assessment

Are bundled resources used appropriately?

Resource	When to Use	Review Question
scripts/	Deterministic/repeated code execution	Is there repetitive code that should be a script?
references/	Detailed docs, API specs, domain knowledge	Is there >10k chars of docs not in references/?
assets/	Templates, images, fonts for output	Are there files that should be assets, not inline content?

Review checklist:

Long docs in SKILL.md body that should be in references/?
Repeated code snippets that should be scripts?
Scripts have correct paths and dependency notes?

Step 6: Performance Impact Assessment

6.1 Level 1 Token Cost

Formula:

Level 1 cost ≈ len(description) / 4 tokens
(English: ~4 chars ≈ 1 token)

Benchmarks:

Excellent: < 50 tokens
Good: 50-100 tokens
Too long: > 150 tokens → needs trimming

6.2 Level 2 Volume

Review checklist:

SKILL.md body over 500 lines (~5000 tokens)?
Repetitive content that can be trimmed?
AI-common-knowledge content that should be deleted?

6.3 Mis-trigger Risk

High-risk signals:

Multiple Skills with overlapping Description keywords
Vague Descriptions (e.g. "general-purpose assistant")
Too many installed Skills (>10) increases mis-trigger risk

Step 7: Comprehensive Scoring

Aggregate all dimension scores into the final report.

SKILL AUDIT REPORT
═══════════════════════════════════════════════════════════════
Skill: [skill-name]
Source: [local path / GitHub URL / ClawHub]
Audited: [date]
───────────────────────────────────────────────────────────────
I.   YAML FRONTMATTER COMPLIANCE       [X/10]
     ✅ [passed items]
     ❌ [issues]

II.  DESCRIPTION QUALITY               [X/40]
     Trigger Accuracy        [X/10]
     Conciseness             [X/10]
     Keyword Coverage        [X/10]
     Non-Redundancy          [X/10]

III. BODY QUALITY                      [X/40]
     Progressive Disclosure  [X/10]
     Role Setting            [X/10]
     Examples                [X/10]
     Instruction Clarity     [X/10]

IV.  RESOURCE LAYERING                 [X/10]
     scripts/ Usage           [X/5]
     references/ Usage       [X/5]

V.   PERFORMANCE IMPACT                [-5 to +2]
     Level 1 Cost            [penalty/bonus]
     Level 2 Volume          [penalty/bonus]
     Mis-trigger Risk        [penalty/bonus]
───────────────────────────────────────────────────────────────
OVERALL SCORE: X / 100
───────────────────────────────────────────────────────────────
Grade:
  🟢 Excellent (85-100)  — Worth installing, top quality
  🟡 Good (70-84)        — Usable, has room for improvement
  🔴 Acceptable (50-69) — Usable but needs optimization
  ⚫ Poor (<50)          — Not recommended
───────────────────────────────────────────────────────────────
VI.  IMPROVEMENT RECOMMENDATIONS (priority order)

  🔴 P0 (must fix):
     - [specific issue and fix]

  🟡 P1 (strongly recommended):
     - [specific issue and fix]

  🟢 P2 (optional):
     - [nice-to-have improvements]
═══════════════════════════════════════════════════════════════

Scoring Reference

Score	Grade	Meaning	Action
85-100	🟢 Excellent	Meets all best practices	Install directly
70-84	🟡 Good	Meets most standards, minor issues	Install, address P1 items
50-69	🔴 Acceptable	Functional but有明显缺陷	Fork and fix, or wait for update
<50	⚫ Poor	Fails best practices	Do not install, find alternatives

Common Issue Diagnosis

Symptom	Cause	Fix
Description too long	Frontmatter >150 tokens	Move details to body, keep only trigger keywords
Body too long	SKILL.md >500 lines	Split into references/
No examples	Text-only instructions	Add 3-5 XML-wrapped example pairs
Vague role	No clear Skill boundary	Add role-setting paragraph
AI-common-knowledge filler	Explaining what AI already knows	Delete, keep only project-specific context
Not layered	Docs in body	Move to references/
Mis-triggers	Overlapping or vague keywords	Differentiate Descriptions

Skill Quality Check vs. Skill Vetter

Use both in sequence: Vet for safety first, then audit for quality.

Quick Audit Commands

# Fetch SKILL.md from GitHub
curl -s "https://raw.githubusercontent.com/<owner>/<repo>/main/skills/<skill>/SKILL.md"

# Check frontmatter
grep -A 5 "^---" SKILL.md | head -10

# Estimate Level 2 volume (lines → ~10 tokens/line)
wc -l SKILL.md

Output Requirements

Every audit report must include:

Overall score (X/100) with grade label
Five dimension subscores (radar chart optional)
Improvement recommendations (P0/P1/P2 priority)
Clear "install or not" conclusion

Do not say "this Skill is pretty good" — deliver a specific score, specific issues, and specific fixes.

Good Skills deserve thorough auditing. Bad Skills deserve honest feedback. 🔍🦀

Examples

Example 1: Perfect Description (Score 10/10)

Input:

name: tdd-skill
description: >
  TDD test-driven development workflow. Use when writing new features,
  adding tests, or fixing bugs. Keywords: test-driven, TDD, red-green-refactor,
  pytest, unit test.

Audit Result:

Trigger Accuracy 10/10 — explicitly states when to use
Conciseness 10/10 — well under 150 chars
Keyword Coverage 10/10 — all key triggers present
Non-Redundancy 10/10 — no AI-common-knowledge filler
Description Score: 40/40

Example 2: Manual-Style Description (Score 3/10)

Input:

name: tdd-skill
description: >
  This is a comprehensive guide to Test-Driven Development using
  the red-green-refactor cycle. First, you write a failing test that
  describes the behavior you want. Then write the minimum code to make
  it pass. Then refactor while keeping tests green. This approach
  ensures high test coverage and better code quality...

Audit Result:

Trigger Accuracy 5/10 — mentions TDD but buried in explanation
Conciseness 1/10 — 280+ chars, reads like a manual
Keyword Coverage 5/10 — "TDD" present but no concise trigger list
Non-Redundancy 1/10 — explains the TDD cycle (Level 2 content in Level 1)
Description Score: 12/40

P0 Recommendation:

Rewrite Description to be under 150 chars. Move the cycle explanation to SKILL.md body.

Example 3: Good Role Setting (Score 9/10)

Input:

# PDF Processing Skill

You are a professional document preparation assistant specializing in
PDF creation, editing, and conversion workflows. You have deep knowledge
of PDF structure, reportlab, pypdf, and weasyprint.

Audit Result:

Role clarity 9/10 — clear persona and domain
Skill boundary 9/10 —明确的职责范围
Context specificity 9/10 — project-specific tools named

Minor improvement (P2): Could add one sentence about what this Skill does NOT cover (e.g. OCR, scanned PDFs).

Example 4: Poor Role Setting (Score 2/10)

Input:

# My Skill

This skill helps you get things done. Use it when you need help.
It provides instructions and guidelines for various tasks.

Audit Result:

Role clarity 2/10 — "assistant" is too generic
Skill boundary 1/10 — "various tasks" defines nothing
Context specificity 1/10 — no project-specific information

P0 Recommendation:

Replace generic language with specific domain context. Define what the Skill does and does not cover.

Example 5: Well-Layered Skill (Score 8/10)

Directory structure:

awesome-skill/
├── SKILL.md              80 lines  (Level 2: execution flow only)
├── references/
│   ├── api-spec.md       450 lines (Level 3: detailed API docs)
│   └── troubleshooting.md 120 lines (Level 3: edge cases)
└── scripts/
    └── validate.sh        (Level 3: deterministic execution)

Audit Result:

Progressive Disclosure 9/10 — clear layer separation
Body size 9/10 — 80 lines is ideal (not bloated)
Resource usage 8/10 — all heavy content in references/
Resource Layering Score: 8.5/10

Minor improvement (P2): Could add a brief Layer 1 summary in Description listing which references/ files are most relevant.

Example 6: Bloated SKILL.md (Score 2/10)

Symptom: SKILL.md has 620 lines including a 300-line API reference pasted directly in the body.

Audit Result:

Progressive Disclosure 1/10 — Level 3 content in Level 2
Body size 1/10 — 620 lines far exceeds 500-line guideline
Conciseness 1/10 — 300-line API doc belongs in references/

P0 Recommendation:

Move the API reference to references/api-spec.md. SKILL.md body should be execution flow only (under 500 lines).

Example 7: Mis-Trigger Risk (Score -3 Performance Impact)

Scenario: User has 12 Skills installed. Two of them have "debug" in their Description:

Skill	Description trigger keyword
systematic-debugging	"debugging, error, bug"
general-helper	"debug, logs, errors, general assistance"

Audit Result:

Mis-trigger Risk: -3 penalty
The overlap means "debug" alone can't reliably select the right Skill

P1 Recommendation:

Differentiate: systematic-debugging should use "systematic-debugging, root-cause" (more specific); general-helper should remove "debug" entirely or move it lower in priority.

Comments

Loading comments...