Install
openclaw skills install generate-judgementsUse when creating or updating test judgement definitions (judge_definitions) for an agent skill evaluation YAML config. Analyzes a skill's SKILL.md and reference files to produce fine-grained yes/no judge questions with scope tags. Triggers on "generate judgements", "create judge definitions", "write test config", "add judgements", "生成判定", "创建测试配置", "更新判定", "补充judgement".
openclaw skills install generate-judgementsAnalyze a skill's source files and produce fine-grained judge_definitions for the
mlflow-skills automated evaluation framework.
Each judgement is a yes/no question that an LLM judge answers by reading the execution trace.
SKILL.md)references/yaml-config-spec.md)digraph generate_judgements {
rankdir=TB;
node [shape=box];
collect [label="Phase 1\nCollect & Analyze Skill Files"];
infer [label="Phase 2\nInfer Scopes"];
confirm_scope [label="User confirms scopes" shape=diamond];
generate [label="Phase 3\nGenerate Judgements per Scope"];
present [label="Phase 4\nPresent to User"];
confirm_judge [label="User approves?" shape=diamond];
write [label="Phase 5\nWrite / Update YAML"];
collect -> infer;
infer -> confirm_scope;
confirm_scope -> generate [label="approved"];
confirm_scope -> infer [label="revise"];
generate -> present;
present -> confirm_judge;
confirm_judge -> write [label="approved"];
confirm_judge -> generate [label="revise"];
}
Ask the user for two inputs (or auto-detect them):
SKILL.mdjudge_definitions section instead of creating a new fileThen read all available files in this order:
| Priority | File | Purpose |
|---|---|---|
| 1 | SKILL.md | Primary source — workflow steps, behavior rules, output format |
| 2 | references/* | Supporting details — templates, CLI commands, query patterns |
| 3 | README.md / README_CN.md | Additional context — scope boundaries, limitations |
| 4 | Existing test config YAML | Understand current judgements to avoid duplication |
While reading, extract and note:
Analyze the skill for distinct execution paths that produce different outputs or
follow different logic. Each distinct path becomes a scope.
How to identify scopes:
Scope naming rules:
checklist, assessment, researchall is reserved — it means "always run regardless of test_scope"all scope for common/shared behaviorPresent inferred scopes to the user with a brief description of each:
I found the following execution branches in this skill:
1. `all` — Common behavior shared across all paths
(skill loading, doc search, categorization, source annotations)
2. `checklist` — Checklist-only output path
(no live resource, generates checklist file, offers next steps)
3. `assessment` — Live assessment path
(runs AWS CLI, generates assessment report, no separate checklist)
Does this look right? Should I add, remove, or rename any scope?
Wait for user confirmation before proceeding.
For each confirmed scope, generate fine-grained judge_definitions. Follow these rules:
One check point per judgement. Each judgement tests exactly ONE behavior or requirement.
# GOOD — one specific check
- name: sequential-mcp-calls
scope: all
question: >
Check that MCP tool calls were executed sequentially...
# BAD — multiple checks crammed into one
- name: workflow-correct
scope: all
question: >
Check that the agent searched docs sequentially, read pages,
extracted items into 5 categories, and wrote the file...
Generate judgements in this order, for each scope:
Category A: Skill Loading & Invocation (scope: all)
Category B: Workflow Behavior (scope: all or scope-specific)
Category C: Output Quality (scope: all or scope-specific)
Category D: Scope-Specific Behavior (per non-all scope)
Category E: Guidelines Compliance (scope: all)
Use kebab-case names that describe the check:
skill-invoked — skill was loaded
sequential-mcp-calls — tool calls are sequential
doc-search-coverage — search queries cover required topics
five-categories-complete — output has all 5 categories
file-naming-convention — output file name matches pattern
aws-cli-commands-executed — CLI commands were run
no-separate-checklist-file — negative check: no extra file
Each question field must be a self-contained instruction for the LLM judge. Follow
the patterns in references/judgement-patterns.md.
Required elements in every question:
Important clarifications to include when relevant:
For each scope, also generate negative judgements — things that should NOT happen:
checklist scope: assessment-only artifacts should NOT appearassessment scope: checklist-only artifacts should NOT appearPresent the generated judgements grouped by scope with clear section headers:
## Generated Judgements
### Scope: all (7 judgements)
| # | Name | Check |
|---|------|-------|
| 1 | skill-invoked | Skill was loaded from .claude/skills/ |
| 2 | sequential-mcp-calls | MCP calls are sequential, not parallel |
| ... | ... | ... |
### Scope: checklist (2 judgements)
| # | Name | Check |
|---|------|-------|
| 1 | file-naming-convention | Output file follows naming pattern |
| ... | ... | ... |
### Scope: assessment (8 judgements)
| ... | ... | ... |
Total: 17 judgements across 3 scopes.
Does this look right? Should I add, remove, or modify any judgement?
Wait for user confirmation. Iterate if the user requests changes.
Once approved, write the output:
Replace only the judge_definitions: section. Preserve all other fields (name,
prompt, skills, timeout_seconds, environment, etc.) exactly as they are.
Add the standard scope comment block above judge_definitions::
# ==============================================================
# Judge Definitions
#
# scope values:
# all — runs in all test scenarios
# {scope1} — only when test_scope={scope1}
# {scope2} — only when test_scope={scope2}
# ==============================================================
judge_definitions:
Generate a complete YAML config file. Ask the user for:
name — test run nameproject_dir — temp project directory nameprompt — default prompt for the testtest_scope — default scope to useUse sensible defaults from the skill directory name for the rest. See
references/yaml-config-spec.md for the full config structure.
File naming: {skill-name}.yaml placed in the appropriate tests/configs/ directory.
After writing, inform the user of the file path and remind them:
test_scope and prompt from the CLIenvironment values won't override existing env varsscope: all always run