Install
openclaw skills install skill-eval-preflightValidate OpenClaw skills during authoring. Use when creating, revising, or preparing a skill for release and you need to scaffold `evals/` files, check readiness for a first eval pass, review whether the frontmatter description has clear trigger coverage, or generate static comparison artifacts before deeper runtime evaluation.
openclaw skills install skill-eval-preflightUse this skill as an authoring-side preflight for OpenClaw skills.
It is not a full runtime evaluator. It helps a skill author move from "this skill exists" to "this skill is structured well enough for first-pass evaluation and later regression work."
This skill is a good fit for requests like:
Do not rely on this skill alone for requests like:
Use a deeper evaluator after this step when you need those capabilities.
Use this skill when you need to:
evals/ files for a new or existing skillUse a deeper evaluator after this step when you need live runtime experiments, tool-call quality checks, or richer output scoring.
Recommended sequence:
skill-vetter -> install/review -> skill-eval -> deeper runtime eval
skill-vetter answers: "Is this skill safe enough to inspect or install?"skill-eval answers: "Is this skill structured well enough to evaluate seriously?"SKILL.md.evals/ does not exist, initialize it with:
evals/evals.jsonevals/triggers.jsonevals/README.mdInitialize eval files:
python3 scripts/init_eval.py /path/to/skill
Check readiness:
python3 scripts/check_eval_readiness.py /path/to/skill
Run static eval checks:
python3 scripts/run_eval.py /path/to/skill
python3 scripts/run_eval.py /path/to/skill --check readiness
python3 scripts/run_eval.py /path/to/skill --check triggers
python3 scripts/run_eval.py /path/to/skill --check artifacts
python3 scripts/run_eval.py /path/to/skill --check files
python3 scripts/run_eval.py /path/to/skill --mode with-skill
python3 scripts/run_eval.py /path/to/skill --mode without-skill --run-group demo-baseline
python3 scripts/compare_runs.py /path/to/skill --run-group demo-baseline
A skill is ready for first-pass evaluation only when:
SKILL.md existsdescription is real and not a placeholderevals/evals.json has at least one non-placeholder eval caseevals/triggers.json has at least one positive and one negative non-placeholder trigger caseexpected_artifactsfiles declarationsrun_eval.py does not perform live trigger experiments against the OpenClaw runtime.
It does not score real outputs for quality, factuality, or tool correctness.
Today it performs static validation passes that:
expected_artifactsfiles entries are well-formedThis skill is for authors who do not yet need a full eval lab, but do need a clean starting point. It is most useful as a lightweight preflight and scaffolding step before deeper evaluation.
Before calling a skill "ready for release," aim for all of the following:
Use compare_runs.py after both modes exist in the same run-group.
It compares:
It writes comparison artifacts under the run-group root.
Read references/eval_format.md when you need the expected file formats and field meanings.