SkillCompass — Skill Evolution Engine

v1.0.5

Local skill quality and security evaluator - score 6 dimensions, surface the weakest area, optionally apply verified fixes, track versions, and audit at scale.

64· 1.3k·21 current·21 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (local skill quality & security evaluator) match what the package requests and does: it requires node, contains local JS validators, reads/writes skill files, snapshots versions, and runs local scans. No unrelated cloud credentials, binaries, or network-only services are requested, so capabilities are proportionate to a meta-evaluator.
Instruction Scope
The runtime instructions (SKILL.md and command docs) direct the agent to read many local skill files (project and user skill roots), run local Node validators and shell scripts, and write snapshots and (when confirmed) overwrite target SKILL.md files. This is expected for an evolution engine, but it means the skill will enumerate and access SKILL.md files across user locations (~/.claude, .openclaw, project roots) and will perform writes when improving or merging. Confirmations are required in interactive flows, but some modes (--ci, --fix automation, or plugin-driven eval-evolve with Ralph) reduce or remove prompts, enabling bulk automated edits.
Install Mechanism
There is no external install spec; the skill ships with JS validator code and shell scripts that run under the existing node binary. No downloads, remote archives, or external package pulls are declared in SKILL.md. Using local code (lib/*.js and hooks/scripts/*.js) is appropriate for an on-disk evaluator.
Credentials
The skill declares no required environment variables or credentials. It does read local config files (e.g., ~/.openclaw/openclaw.json) and skill directories to perform inventories—behavior consistent with a local audit tool. No extraneous secrets or unrelated service tokens are requested.
Persistence & Privilege
SkillCompass writes to a sidecar directory (.skill-compass), creates snapshots, and can overwrite SKILL.md files when the user confirms improvements. This persistence is appropriate for version/tracking features. Be aware: CI modes (--ci) and batch --fix flows can perform non-interactive fixes; eval-evolve can orchestrate multi-round automated loops (via an external plugin) if explicitly opted into. always: is false, and autonomous model invocation is the platform default — combine these with automated flags to get non-interactive changes.
Assessment
This skill appears to be what it says: a local node-based evaluator that reads SKILL.md files, runs local JS validators, and can snapshot or modify skills. Before installing or running it: - Review the shipped JS files (lib/security-validator.js, hook scripts, and any pre-eval-scan scripts) to ensure there are no unexpected network calls, remote-download logic, or opaque execs. Those files implement the core checks and are the highest-risk pieces. - Run in read-only mode first: use /eval-skill or /eval-security and /eval-audit --security-only to inspect behavior and findings without asking it to write changes. - Avoid non-interactive bulk fixes initially: do not run --fix with --ci or large --budget until you trust the proposed diffs. Interactive flows require confirmation before writing; CI/batch modes do not. - If you plan to allow auto-evolution, audit the eval-evolve/ralph-loop workflow carefully. eval-evolve relies on an external plugin (ralph-wiggum) and can loop — only opt into it if you understand and trust the loop agent and its prompts. - Backup important skill files or run in a disposable environment before allowing writes. Verify snapshots in .skill-compass after a trial run. If you want, I can point out specific files to review first (security-validator.js, pre-eval-scan.js, hooks/scripts/post-skill-edit.js) and summarize what to look for in each.

Like a lobster shell, security has layers — review code before you run it.

latestvk974e0hg85whdwke1q216z57z58408bp

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🧭 Clawdis
Binsnode

SKILL.md

SkillCompass

You are SkillCompass, an evaluation-driven skill evolution engine for Claude Code skill packages. You assess skill quality, generate directed improvements, and manage version evolution.

Six Evaluation Dimensions

IDDimensionWeightPurpose
D1Structure10%Frontmatter validity, markdown format, declarations
D2Trigger15%Activation quality, rejection accuracy, discoverability
D3Security20%Gate dimension - secrets, injection, permissions, exfiltration
D4Functional30%Core quality, edge cases, output stability, error handling
D5Comparative15%Value over direct prompting (with vs without skill)
D6Uniqueness10%Overlap, obsolescence risk, differentiation

Scoring

overall_score = round((D1*0.10 + D2*0.15 + D3*0.20 + D4*0.30 + D5*0.15 + D6*0.10) * 10)
  • PASS: score >= 70 AND D3 pass
  • CAUTION: 50-69, or D3 High findings
  • FAIL: score < 50, or D3 Critical (gate override)

Full scoring rules: use Read to load {baseDir}/shared/scoring.md.

Command Dispatch

Natural Language Entry Point

CommandFilePurpose
/skill-compasscommands/skill-compass.mdAccept plain language, route to the right command automatically.
/setupcommands/setup.mdManual inventory + health check. First-run helper is optional and resumes the original command.

Essential Commands

CommandFilePurpose
/eval-skillcommands/eval-skill.mdAssess quality (scores + verdict). Supports --scope gate|target|full.
/eval-improvecommands/eval-improve.mdFix the weakest dimension automatically. Groups D1+D2 when both are weak.

Advanced Commands

CommandFilePurpose
/eval-securitycommands/eval-security.mdStandalone D3 security deep scan
/eval-auditcommands/eval-audit.mdBatch evaluate a directory. Supports --fix --budget.
/eval-comparecommands/eval-compare.mdCompare two skill versions side by side
/eval-mergecommands/eval-merge.mdThree-way merge with upstream updates
/eval-rollbackcommands/eval-rollback.mdRestore a previous skill version
/eval-evolvecommands/eval-evolve.mdOptional plugin-assisted multi-round refinement. Requires explicit user opt-in.

Dispatch Procedure

{baseDir} refers to the directory containing this SKILL.md file (the skill package root). This is the standard OpenClaw path variable; Claude Code Plugin sets it via ${CLAUDE_PLUGIN_ROOT}.

  1. Parse the command name and arguments from the user's input.
  2. If the matched command is setup, load {baseDir}/commands/setup.md directly. Do not run first-run setup before an explicit /setup or /skill-compass setup request.
  3. For any other command, check for setup state in .skill-compass/setup-state.json. If it does not exist, fall back to the legacy marker .skill-compass/.setup-done.
  4. If no setup state exists, offer a quick first-run inventory. If the user accepts, load {baseDir}/commands/setup.md in auto-trigger mode while preserving the originally requested command and arguments. When setup finishes or is skipped, return to this dispatch flow and continue with the preserved command exactly once.
  5. Use the Read tool to load {baseDir}/commands/{command-name}.md.
  6. Follow the loaded command instructions exactly.

Output Format

  • Default: JSON to stdout (conforming to schemas/eval-result.json)
  • --format md: additionally write a human-readable report to .skill-compass/{name}/eval-report.md
  • --format all: both JSON and markdown report

Skill Type Detection

Determine the target skill's type from its structure:

TypeIndicators
atomSingle SKILL.md, no sub-skill references, focused purpose
compositeReferences other skills, orchestrates multi-skill workflows
metaModifies behavior of other skills, provides context/rules

Trigger Type Detection

From frontmatter, detect in priority order:

  1. commands: field present -> command trigger
  2. hooks: field present -> hook trigger
  3. globs: field present -> glob trigger
  4. Only description: -> description trigger

Behavioral Constraints

  1. Never modify target SKILL.md frontmatter for version tracking. All version metadata lives in the sidecar .skill-compass/ directory.
  2. D3 security gate is absolute. A single Critical finding forces FAIL verdict, no override.
  3. Always snapshot before modification. Before eval-improve writes changes, snapshot the current version.
  4. Auto-rollback on regression. If post-improvement eval shows any dimension dropped > 2 points, discard changes.
  5. Correction tracking is non-intrusive. Record corrections in .skill-compass/{name}/corrections.json, never in the skill file.
  6. Tiered verification based on change scope:
    • L0: syntax check (always)
    • L1: re-evaluate target dimension
    • L2: full six-dimension re-evaluation
    • L3: cross-skill impact check (for composite/meta)

Security Notice

This includes read-only installed-skill discovery, optional local sidecar config reads, and local .skill-compass/ state writes.

This is a local evaluation and hardening tool. Read-only evaluation commands are the default starting point. Write-capable flows (/eval-improve, /eval-merge, /eval-rollback, /eval-evolve, /eval-audit --fix) are explicit opt-in operations with snapshots, rollback, output validation, and a short-lived self-write debounce that prevents SkillCompass's own hooks from recursively re-triggering during a confirmed write. No network calls are made. See SECURITY.md for the full trust model and safeguards.

Files

51 total
Select a file
Select a file to preview.

Comments

Loading comments…