Skill Quality Check

v1.0.4

Quality audit for AI Agent Skills. Use before installing or after writing any SKILL.md. Scores 5 dimensions with actionable improvements. Works for skills wr...

1· 135·0 current·0 all-time
byDenny@webkong

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for webkong/skill-quality-check.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Skill Quality Check" (webkong/skill-quality-check) from ClawHub.
Skill page: https://clawhub.ai/webkong/skill-quality-check
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install skill-quality-check

ClawHub CLI

Package manager switcher

npx clawhub@latest install skill-quality-check
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name, description, and included files (SKILL.md, README, examples, references) align with a Skill-audit tool. There are no binaries, env vars, or config paths required that would be unrelated to auditing SKILL.md files.
Instruction Scope
Instructions explicitly tell the agent to read local SKILL.md files (common skill dirs are listed) and optionally fetch a remote SKILL.md via raw.githubusercontent.com using curl if no local copy exists — this is coherent for an audit tool. Note: the curl step implies network fetch of arbitrary public repo content; that is expected for remote audits but is a capability the user should be aware of.
Install Mechanism
No install spec or code files; instruction-only skills have low disk/write risk. Everything is documentation and examples; nothing will be downloaded or executed by default outside of optional user-invoked network fetch guidance.
Credentials
The skill requests no environment variables, no credentials, and no config paths. That is proportional for a documentation/audit skill.
Persistence & Privilege
always:false and no instructions to persist configuration, modify other skills, or change system-wide settings. The skill does not ask for elevated or permanent presence.
Assessment
This appears to be a coherent, documentation-only audit tool. Before using: (1) be aware the SKILL.md suggests fetching remote files with curl — only fetch SKILL.md from repositories you trust; network fetches can retrieve arbitrary content. (2) The audit reads SKILL.md and adjacent reference files in common skill directories (e.g., ~/.openclaw, ~/.claude); run it only if you consent to that file access. (3) Review the bundled references/examples yourself — the auditor's recommendations are only as good as its rulebook. If you plan to run automated CI checks, review and pin any scripts you add separately rather than relying solely on remote fetches.

Like a lobster shell, security has layers — review code before you run it.

auditvk977qny54m2c6sertgyvb1hj1183tezhbest-practicesvk977qny54m2c6sertgyvb1hj1183tezhclaudevk977qny54m2c6sertgyvb1hj1183tezhcodexvk977qny54m2c6sertgyvb1hj1183tezhcursorvk977qny54m2c6sertgyvb1hj1183tezhframeworkvk977qny54m2c6sertgyvb1hj1183tezhlatestvk977qny54m2c6sertgyvb1hj1183tezhopenclawvk977qny54m2c6sertgyvb1hj1183tezhskill-qualityvk977qny54m2c6sertgyvb1hj1183tezh
135downloads
1stars
5versions
Updated 4w ago
v1.0.4
MIT-0

Skill Quality Check 🔍

Universal quality assessment framework for AI Agent Skills. Evaluates any SKILL.md file across 5 dimensions, outputting a quantified score and actionable improvement suggestions. Designed to work with skills built for Claude, Cursor, Codex, OpenClaw, or any AI agent.

When to Use

  • Before installing a new Skill from any source
  • After writing your own Skill (self-check)
  • Comparing quality of similar Skills
  • Evaluating Skills for ClawHub/SkillHub submission
  • As companion to Skill Creator — learn to write, then learn to audit

Audit Protocol

Step 1: Locate and Read the Target Skill

Find the SKILL.md file:

# Path priority (in order):
1. User-specified path
2. <skills-dir>/<skill-name>/SKILL.md

   # Common locations by platform:
   #   OpenClaw:     ~/.openclaw/skills/<skill-name>/SKILL.md
   #   QClaw:        ~/.qclaw/skills/<skill-name>/SKILL.md
   #   Claude Code:  ~/.claude/skills/<skill-name>/SKILL.md
   #   Cursor:       ~/.cursor/skills/<skill-name>/SKILL.md
   #   Codex:        ~/.codex/skills/<skill-name>/SKILL.md
3. <repo>/skills/<skill-name>/SKILL.md
4. <repo>/<skill-name>/SKILL.md

# If installing from GitHub without a local copy, fetch via curl:
curl -s "https://raw.githubusercontent.com/<owner>/<repo>/main/skills/<skill>/SKILL.md"

Then scan the directory for supporting files:

skill-name/
├── SKILL.md       ✅ required
├── scripts/       ✅ optional (lazy-loaded)
├── references/   ✅ optional (lazy-loaded)
└── assets/        ✅ optional (lazy-loaded)

Step 2: YAML Frontmatter Review

SKILL.md must have YAML frontmatter with only these fields:

---
name: <skill-name>      ✅ required
description: >          ✅ required
# Fields below are NOT recommended in frontmatter:
# ❌ version           → package metadata
# ❌ author            → non-standard
# ❌ license           → non-essential
# ❌ compatibility     → most Skills don't need it
# ❌ tags              → non-standard
---

Review checklist:

  • Does name and description exist?
  • Is description under 150 characters (trigger-level content must be concise)?
  • Does description include trigger keywords ("when to use")?
  • Are there extra fields wasting Level 1 tokens?

Step 3: Description Quality Assessment

Description is Level 1 content — the AI uses it to decide whether to trigger the Skill. It is a trigger, not a manual.

✅ Good Description:

TDD test-driven development workflow. Use when writing new features,
adding tests, or debugging. Keywords: test-driven, TDD, red-green-refactor.

❌ Bad Description:

This is a comprehensive guide to Test-Driven Development using the
red-green-refactor cycle. First, write a failing test that describes
the behavior you want. Then write the minimum code to make it pass...

(Too long — contains Level 2 content that belongs in SKILL.md body)

Scoring rubric (each dimension 0-10):

#DimensionQuestion
1Trigger AccuracyDoes it clearly state when to use this Skill?
2ConcisenessUnder 150 chars? No explanatory filler?
3Keyword CoverageDoes it include trigger keywords (e.g. TDD, debug, pdf)?
4Non-RedundancyDoes it avoid restating what AI already knows?

Step 4: SKILL.md Body Quality Assessment

Five assessment dimensions (0-10 each):

4.1 Progressive Disclosure

Does it follow the three-layer loading principle?

LayerContentWhen Loaded
Level 1name + descriptionAlways in context
Level 2SKILL.md bodyOn skill trigger
Level 3scripts/ + references/ + assets/On execution, never in context

Review checklist:

  • Trigger conditions → should be in Description (Level 1)
  • Execution steps, tool instructions → SKILL.md body (Level 2)
  • Detailed docs, scripts, templates → references/scripts (Level 3)
  • SKILL.md body under 500 lines?

4.2 Role Setting

Does the Skill open with a clear role or context definition?

✅ Good example:

# PDF Processing Skill

You are a professional document preparation assistant specializing in
PDF creation and editing workflows...

4.3 Examples

Are there sufficient, relevant, and diverse examples?

Claude recommends 3-5 examples that are:

  • Relevant: tied to real use cases
  • Diverse: cover edge cases
  • Structured: wrapped in XML tags

Review checklist:

  • Input/output example pairs present?
  • Core use cases covered?
  • Edge cases shown?

4.4 Instruction Clarity

Are instructions clear, actionable, and unambiguous?

Review checklist:

  • Steps listed with numbered lists?
  • Conditional branches explained?
  • Error/exception handling covered?
  • Output format specified (e.g. JSON structure)?

Step 5: Resource Layer Assessment

Are bundled resources used appropriately?

ResourceWhen to UseReview Question
scripts/Deterministic/repeated code executionIs there repetitive code that should be a script?
references/Detailed docs, API specs, domain knowledgeIs there >10k chars of docs not in references/?
assets/Templates, images, fonts for outputAre there files that should be assets, not inline content?

Review checklist:

  • Long docs in SKILL.md body that should be in references/?
  • Repeated code snippets that should be scripts?
  • Scripts have correct paths and dependency notes?

Step 6: Performance Impact Assessment

6.1 Level 1 Token Cost

Formula:

Level 1 cost ≈ len(description) / 4 tokens
(English: ~4 chars ≈ 1 token)

Benchmarks:

  • Excellent: < 50 tokens
  • Good: 50-100 tokens
  • Too long: > 150 tokens → needs trimming

6.2 Level 2 Volume

Review checklist:

  • SKILL.md body over 500 lines (~5000 tokens)?
  • Repetitive content that can be trimmed?
  • AI-common-knowledge content that should be deleted?

6.3 Mis-trigger Risk

High-risk signals:

  • Multiple Skills with overlapping Description keywords
  • Vague Descriptions (e.g. "general-purpose assistant")
  • Too many installed Skills (>10) increases mis-trigger risk

Step 7: Comprehensive Scoring

Aggregate all dimension scores into the final report.

SKILL AUDIT REPORT
═══════════════════════════════════════════════════════════════
Skill: [skill-name]
Source: [local path / GitHub URL / ClawHub]
Audited: [date]
───────────────────────────────────────────────────────────────
I.   YAML FRONTMATTER COMPLIANCE       [X/10]
     ✅ [passed items]
     ❌ [issues]

II.  DESCRIPTION QUALITY               [X/40]
     Trigger Accuracy        [X/10]
     Conciseness             [X/10]
     Keyword Coverage        [X/10]
     Non-Redundancy          [X/10]

III. BODY QUALITY                      [X/40]
     Progressive Disclosure  [X/10]
     Role Setting            [X/10]
     Examples                [X/10]
     Instruction Clarity     [X/10]

IV.  RESOURCE LAYERING                 [X/10]
     scripts/ Usage           [X/5]
     references/ Usage       [X/5]

V.   PERFORMANCE IMPACT                [-5 to +2]
     Level 1 Cost            [penalty/bonus]
     Level 2 Volume          [penalty/bonus]
     Mis-trigger Risk        [penalty/bonus]
───────────────────────────────────────────────────────────────
OVERALL SCORE: X / 100
───────────────────────────────────────────────────────────────
Grade:
  🟢 Excellent (85-100)  — Worth installing, top quality
  🟡 Good (70-84)        — Usable, has room for improvement
  🔴 Acceptable (50-69) — Usable but needs optimization
  ⚫ Poor (<50)          — Not recommended
───────────────────────────────────────────────────────────────
VI.  IMPROVEMENT RECOMMENDATIONS (priority order)

  🔴 P0 (must fix):
     - [specific issue and fix]

  🟡 P1 (strongly recommended):
     - [specific issue and fix]

  🟢 P2 (optional):
     - [nice-to-have improvements]
═══════════════════════════════════════════════════════════════

Scoring Reference

ScoreGradeMeaningAction
85-100🟢 ExcellentMeets all best practicesInstall directly
70-84🟡 GoodMeets most standards, minor issuesInstall, address P1 items
50-69🔴 AcceptableFunctional but有明显缺陷Fork and fix, or wait for update
<50⚫ PoorFails best practicesDo not install, find alternatives

Common Issue Diagnosis

SymptomCauseFix
Description too longFrontmatter >150 tokensMove details to body, keep only trigger keywords
Body too longSKILL.md >500 linesSplit into references/
No examplesText-only instructionsAdd 3-5 XML-wrapped example pairs
Vague roleNo clear Skill boundaryAdd role-setting paragraph
AI-common-knowledge fillerExplaining what AI already knowsDelete, keep only project-specific context
Not layeredDocs in bodyMove to references/
Mis-triggersOverlapping or vague keywordsDifferentiate Descriptions

Skill Quality Check vs. Skill Vetter

| Dimension | Skill Vetter | Skill Quality Check | | Goal | Security review | Quality review | | Core question | Will this Skill harm me? | Is this Skill well-written? | | Focus | Malicious code, permission abuse | Writing standards, performance | | When | Before any install | When assessing quality | | Output | Security report | Quality score + recommendations |

Use both in sequence: Vet for safety first, then audit for quality.

Quick Audit Commands

# Fetch SKILL.md from GitHub
curl -s "https://raw.githubusercontent.com/<owner>/<repo>/main/skills/<skill>/SKILL.md"

# Check frontmatter
grep -A 5 "^---" SKILL.md | head -10

# Estimate Level 2 volume (lines → ~10 tokens/line)
wc -l SKILL.md

Output Requirements

Every audit report must include:

  1. Overall score (X/100) with grade label
  2. Five dimension subscores (radar chart optional)
  3. Improvement recommendations (P0/P1/P2 priority)
  4. Clear "install or not" conclusion

Do not say "this Skill is pretty good" — deliver a specific score, specific issues, and specific fixes.


Good Skills deserve thorough auditing. Bad Skills deserve honest feedback. 🔍🦀


Examples

Example 1: Perfect Description (Score 10/10)

Input:

name: tdd-skill
description: >
  TDD test-driven development workflow. Use when writing new features,
  adding tests, or fixing bugs. Keywords: test-driven, TDD, red-green-refactor,
  pytest, unit test.

Audit Result:

  • Trigger Accuracy 10/10 — explicitly states when to use
  • Conciseness 10/10 — well under 150 chars
  • Keyword Coverage 10/10 — all key triggers present
  • Non-Redundancy 10/10 — no AI-common-knowledge filler
  • Description Score: 40/40

Example 2: Manual-Style Description (Score 3/10)

Input:

name: tdd-skill
description: >
  This is a comprehensive guide to Test-Driven Development using
  the red-green-refactor cycle. First, you write a failing test that
  describes the behavior you want. Then write the minimum code to make
  it pass. Then refactor while keeping tests green. This approach
  ensures high test coverage and better code quality...

Audit Result:

  • Trigger Accuracy 5/10 — mentions TDD but buried in explanation
  • Conciseness 1/10 — 280+ chars, reads like a manual
  • Keyword Coverage 5/10 — "TDD" present but no concise trigger list
  • Non-Redundancy 1/10 — explains the TDD cycle (Level 2 content in Level 1)
  • Description Score: 12/40

P0 Recommendation:

Rewrite Description to be under 150 chars. Move the cycle explanation to SKILL.md body.


Example 3: Good Role Setting (Score 9/10)

Input:

# PDF Processing Skill

You are a professional document preparation assistant specializing in
PDF creation, editing, and conversion workflows. You have deep knowledge
of PDF structure, reportlab, pypdf, and weasyprint.

Audit Result:

  • Role clarity 9/10 — clear persona and domain
  • Skill boundary 9/10 —明确的职责范围
  • Context specificity 9/10 — project-specific tools named

Minor improvement (P2): Could add one sentence about what this Skill does NOT cover (e.g. OCR, scanned PDFs).


Example 4: Poor Role Setting (Score 2/10)

Input:

# My Skill

This skill helps you get things done. Use it when you need help.
It provides instructions and guidelines for various tasks.

Audit Result:

  • Role clarity 2/10 — "assistant" is too generic
  • Skill boundary 1/10 — "various tasks" defines nothing
  • Context specificity 1/10 — no project-specific information

P0 Recommendation:

Replace generic language with specific domain context. Define what the Skill does and does not cover.


Example 5: Well-Layered Skill (Score 8/10)

Directory structure:

awesome-skill/
├── SKILL.md              80 lines  (Level 2: execution flow only)
├── references/
│   ├── api-spec.md       450 lines (Level 3: detailed API docs)
│   └── troubleshooting.md 120 lines (Level 3: edge cases)
└── scripts/
    └── validate.sh        (Level 3: deterministic execution)

Audit Result:

  • Progressive Disclosure 9/10 — clear layer separation
  • Body size 9/10 — 80 lines is ideal (not bloated)
  • Resource usage 8/10 — all heavy content in references/
  • Resource Layering Score: 8.5/10

Minor improvement (P2): Could add a brief Layer 1 summary in Description listing which references/ files are most relevant.


Example 6: Bloated SKILL.md (Score 2/10)

Symptom: SKILL.md has 620 lines including a 300-line API reference pasted directly in the body.

Audit Result:

  • Progressive Disclosure 1/10 — Level 3 content in Level 2
  • Body size 1/10 — 620 lines far exceeds 500-line guideline
  • Conciseness 1/10 — 300-line API doc belongs in references/

P0 Recommendation:

Move the API reference to references/api-spec.md. SKILL.md body should be execution flow only (under 500 lines).


Example 7: Mis-Trigger Risk (Score -3 Performance Impact)

Scenario: User has 12 Skills installed. Two of them have "debug" in their Description:

SkillDescription trigger keyword
systematic-debugging"debugging, error, bug"
general-helper"debug, logs, errors, general assistance"

Audit Result:

  • Mis-trigger Risk: -3 penalty
  • The overlap means "debug" alone can't reliably select the right Skill

P1 Recommendation:

Differentiate: systematic-debugging should use "systematic-debugging, root-cause" (more specific); general-helper should remove "debug" entirely or move it lower in priority.

Comments

Loading comments...