Install
openclaw skills install xiaobai-skill-quality-evalSkill Quality Evaluator - Score any skill on 6 dimensions. Catch 30% of skills that look good but fail silently. Based on Tessl Research 2026 findings.
openclaw skills install xiaobai-skill-quality-evalScore any skill on 6 dimensions. Catch the 30% of skills that look good but fail silently.
Tessl Research (April 2026) found:
This skill helps you evaluate and improve your skills systematically.
Can the agent find and activate this skill when needed?
Checklist:
Common Issues:
Score Guide:
Does the skill handle the tasks it claims to cover?
Checklist:
Common Issues:
Score Guide:
Can the agent follow the instructions without confusion?
Checklist:
Common Issues:
Score Guide:
Does the evaluation actually test the skill, or does it leak answers?
Checklist:
Common Issues (from Tessl Research):
Score Guide:
Does the skill work across different model sizes?
Checklist:
Tessl Finding: Small model + right skill ≈ Large model at 3X lower cost.
Score Guide:
Does using this skill actually improve outcomes vs no skill?
Checklist:
Score Guide:
# Skill Evaluation Report
**Skill**: [name]
**Version**: [version]
**Date**: YYYY-MM-DD
**Evaluator**: [agent/session]
## Overall Score: XX/100
| Dimension | Score | Status |
|-----------|-------|--------|
| Activation Reliability | XX | 🟢/🟡/🔴 |
| Task Coverage | XX | 🟢/🟡/🔴 |
| Instruction Clarity | XX | 🟢/🟡/🔴 |
| Leakage Resistance | XX | 🟢/🟡/🔴 |
| Model Compatibility | XX | 🟢/🟡/🔴 |
| Real-World Value | XX | 🟢/🟡/🔴 |
🟢 80+ | 🟡 50-79 | 🔴 <50
## Critical Issues
1. [Issue] → [Fix]
## Improvement Recommendations
1. [Recommendation] → [Expected impact]
## Quick Wins (easy fixes, big impact)
1. [Fix] → +X points on [dimension]
Read the skill's SKILL.md and evaluate on all 6 dimensions.
Generate the evaluation report.
Save to memory/evaluations/<skill-name>-eval.md
1. Read evaluation report
2. Focus on lowest-scoring dimension
3. Apply quick wins first
4. Re-evaluate
5. Repeat until all dimensions ≥ 70
For each skill in skills/ directory:
1. Read SKILL.md
2. Evaluate on 6 dimensions
3. Generate report
4. Identify top 3 improvements
Save summary to memory/evaluations/batch-report.md
| Pattern | Issue | Fix |
|---|---|---|
| "Do X when appropriate" | Vague trigger | Define specific conditions |
| No examples | Agent can't learn | Add 3+ concrete examples |
| Only happy path | Fragile in production | Add error handling examples |
| Verbatim solutions | Leakage risk | Use different examples for eval |
| No model requirements | Unknown compatibility | Test with 2+ model sizes |
MIT