Skill Evaluator

Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, evaluate, score, or assess a skill before publishing, or when checking skill quality. Runs automated structural checks and guides manual assessment across 25 criteria.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
3 · 2.1k · 7 current installs · 7 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the delivered artifacts: SKILL.md describes running scripts/evaluations and the repo contains scripts/eval-skill.py, a rubric (references/rubric.md), and an evaluation template. The checks the script implements (frontmatter, file structure, docs, simple script analysis) are coherent with the stated evaluator purpose.
Instruction Scope
SKILL.md explicitly instructs the agent to run the local script (python3 scripts/eval-skill.py /path/to/skill) and to read/skim code and docs — this necessarily requires reading files in the target skill directory, which is intended. Manual scoring steps are required and the evaluator recommends an optional external scanner (SkillLens) — that recommendation is optional and not required for operation.
Install Mechanism
No install spec is provided (instruction-only skill). The included Python script requires Python 3.6+ and PyYAML (documented in SKILL.md). No network downloads, external archives, or package installs are required by the skill itself.
Credentials
The skill requests no environment variables, no credentials, and no config paths. The evaluator script scans files for issues (including credential-like patterns) when run, which is appropriate for its purpose but means you should not run it against directories containing secrets you don't want inspected.
Persistence & Privilege
always:false and user-invocable:true. The skill does not request persistent agent presence or attempt to modify other skills or system-wide settings. It performs local read-only analysis of a provided skill directory (writes only when you copy the EVAL_TEMPLATE to create EVAL.md, which is an intended publishing artifact).
Scan Findings in Context
[pre-scan-injection-signals] expected: No pre-scan/injection signals were detected. This is expected for a local, instruction-driven evaluator that contains no network or downloader behavior.
Assessment
This skill appears internally consistent and appropriate for reviewing other skills. Before running: 1) Inspect scripts/eval-skill.py yourself (it only reads files and parses YAML/AST; it does not spawn subprocesses or network calls). 2) Ensure you run it on the intended skill directory (don't point it at system or private repos containing secrets). 3) Install Python 3.6+ and PyYAML (pip install pyyaml) if you plan to run the automated checks. 4) Remember the automated script covers only structural/heuristic checks — manual judgment is required for many rubric items. 5) SKILL.md recommends an optional external tool (SkillLens via npm) — that is not required by this skill; treat external tool recommendations as separate dependencies and review them before use.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk972pjdst0p8pw0eqhbk13rg8x8095vk

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Skill Evaluator

Evaluate skills across 25 criteria using a hybrid automated + manual approach.

Quick Start

1. Run automated checks

python3 scripts/eval-skill.py /path/to/skill
python3 scripts/eval-skill.py /path/to/skill --json    # machine-readable
python3 scripts/eval-skill.py /path/to/skill --verbose  # show all details

Checks: file structure, frontmatter, description quality, script syntax, dependency audit, credential scan, env var documentation.

2. Manual assessment

Use the rubric at references/rubric.md to score 25 criteria across 8 categories (0–4 each, 100 total). Each criterion has concrete descriptions per score level.

3. Write the evaluation

Copy assets/EVAL-TEMPLATE.md to the skill directory as EVAL.md. Fill in automated results + manual scores.

Evaluation Process

  1. Run eval-skill.py — get the automated structural score
  2. Read the skill's SKILL.md — understand what it does
  3. Read/skim the scripts — assess code quality, error handling, testability
  4. Score each manual criterion using references/rubric.md — concrete criteria per level
  5. Prioritize findings as P0 (blocks publishing) / P1 (should fix) / P2 (nice to have)
  6. Write EVAL.md in the skill directory with scores + findings

Categories (8 categories, 25 criteria)

#CategorySource FrameworkCriteria
1Functional SuitabilityISO 25010Completeness, Correctness, Appropriateness
2ReliabilityISO 25010Fault Tolerance, Error Reporting, Recoverability
3Performance / ContextISO 25010 + AgentToken Cost, Execution Efficiency
4Usability — AI AgentShneiderman, Gerhardt-PowalsLearnability, Consistency, Feedback, Error Prevention
5Usability — HumanTognazzini, NormanDiscoverability, Forgiveness
6SecurityISO 25010 + OpenSSFCredentials, Input Validation, Data Safety
7MaintainabilityISO 25010Modularity, Modifiability, Testability
8Agent-SpecificNovelTrigger Precision, Progressive Disclosure, Composability, Idempotency, Escape Hatches

Interpreting Scores

RangeVerdictAction
90–100ExcellentPublish confidently
80–89GoodPublishable, note known issues
70–79AcceptableFix P0s before publishing
60–69Needs WorkFix P0+P1 before publishing
<60Not ReadySignificant rework needed

Deeper Security Scanning

This evaluator covers security basics (credentials, input validation, data safety) but for thorough security audits of skills under development, consider SkillLens (npx skilllens scan <path>). It checks for exfiltration, code execution, persistence, privilege bypass, and prompt injection — complementary to the quality focus here.

Dependencies

  • Python 3.6+ (for eval-skill.py)
  • PyYAML (pip install pyyaml) — for frontmatter parsing in automated checks

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…