Install
openclaw skills install autoagentAutomatically improve agent guidance through iterative testing and scoring. Use when you want to optimize prompts, AGENTS.md entries, or skill definitions us...
openclaw skills install autoagentOptimize any agent guidance through automated testing and iterative improvement.
/autoagent
Every invocation of /autoagent starts fresh with interactive setup questions.
Ask the user:
Where should I create the sandbox folder? Default:
../../autoagent-sandbox/(resolves to/clawd/autoagent-sandbox/)
You can respond with:
../../autoagent-sandbox/../../autoagent-news/ → /clawd/autoagent-news/../../agentDev/optimize/ → /clawd/agentDev/optimize//some/other/path/optimize/ → exact pathWait for their response (or empty for default).
Ask the user:
Let's define how we'll measure success. What does a "good" result look like for this task?
Follow up one at a time based on their response:
Once you have enough information, propose a draft scoring.md:
## Proposed Scoring Criteria
**Score Components:**
- [Component 1]: [X] points - [description]
- [Component 2]: [Y] points - [description]
- ...
**Total:** 100 points
**[Any additional notes]**
Wait for user approval or modifications.
Ask the user:
Does the guidance rely on any scripts, tools, or external software?
- If yes: Note each script/tool path and what functionality it provides
- The autoagent should analyze these to recommend improvements
Ask the user:
Run optimization every 5 minutes (default), or different interval?
After all questions answered, create the sandbox folder at the user-specified path:
sandbox/
├── guidance-under-test.md # Copy of original guidance
├── current-guidance.md # Same as guidance-under-test initially
├── fixtures/
│ └── test-cases.json # {"cases": [{"input": "...", "expected": "..."}]}
├── scoring.md # Scoring criteria document (user-approved)
├── scores.md # Score history table
└── scripts/ # (optional) Copy of referenced scripts/tools
Use OpenClaw cron syntax to schedule the iteration agent:
*/5 * * * *)Return confirmation message showing the resolved path:
"Optimization started at
/clawd/autoagent-news/. I'll check back every 5 minutes. Monitor progress inscores.md."
Each time the cron triggers, do the following:
Read from the sandbox:
current-guidance.md - The guidance being optimizedscores.md - History of scores and changesscoring.md - How to measure successfixtures/test-cases.json - Test inputs (MUST read this to understand what the guidance is being tested against)Review score history (last 10 runs or all available runs if fewer than 10 exist), identify patterns, note current score. When fewer than 10 runs exist, treat all available scores as the set for plateau detection.
Important: Load the test cases from fixtures/test-cases.json to understand what specific outputs/ behaviors are expected. The edit should address gaps revealed by test case failures or missing criteria.
If the guidance references any scripts, tools, or external software:
Example outputs:
Generate ONE specific edit to the guidance that might improve the score.
Analyze Score History First:
Edit Selection Strategy (Priority Order):
The edit should:
Format:
## Proposed Edit
**Rationale:** Why this change might help
**Change:**
[Show exact diff or new text]
Write the edited guidance to current-guidance.md
Use a subagent to run the task with the new guidance:
current-guidance.mdfixtures/test-cases.jsonsessions_spawn with task containing the full contents of current-guidance.md, include the test cases JSON inline in the task prompt, set timeoutSeconds to 120, and request the subagent to return the raw output (not just pass/fail)Evaluate the output against scoring.md criteria.
Generate a score 0-100.
Append to scores.md:
| N | Description of change | SCORE | keep/discard |
Where N is the run number (increment from last).
current-guidance.md is already updated)current-guidance.md to previous versionIf last 10 scores are within 5 points of each other:
| File | Description |
|---|---|
guidance-under-test.md | Original copy (read-only reference) |
current-guidance.md | Working version (edited each iteration) |
fixtures/test-cases.json | Input → expected output pairs |
scoring.md | Scoring methodology |
scores.md | Score history log |
scripts/ | (optional) Copies of referenced scripts/tools for analysis |
/autoagentscores.md for progressscores.md for progressguidance-under-test.md