Install
openclaw skills install agent-cost-eval-kitAgent Cost Eval Kit — Quickly check whether an agent looks unusually expensive, then evaluate confirmed cost-control changes only when comparable evidence exists.
openclaw skills install agent-cost-eval-kitQuickly check whether an agent looks unusually expensive, then decide whether to ignore, watch, investigate one path, run a deeper audit, or evaluate a confirmed change.
openclaw skills install agent-cost-eval-kit
Fallback URL install:
openclaw skills install https://clawhub.ai/choosenobody/agent-cost-eval-kit
Install for all local agents:
openclaw skills install agent-cost-eval-kit --global
Force update:
openclaw skills install agent-cost-eval-kit --global --force
Primary:
eval agent cost change
Also triggers:
You do not need to know what changed.
Start with one sentence:
I suspect My_Agent got more expensive
or
quick check agent cost for My_Agent
The skill will first return:
Full before/after evaluation is optional and only used when comparable evidence exists.
Keep / Revert / Narrow only when you provide comparable before/after evidence.
| Status | When to use |
|---|---|
| No Action Needed | No meaningful cost anomaly is visible from the provided evidence |
| Watch | Possible cost increase, not enough evidence to act. Observe one path only. |
| Investigate One Path | Suspicious pattern in one agent/kind/task path. Inspect that path only. |
| Run Routing Audit | Evidence suggests possible model/routing/retry/fallback issue. Recommend audit agent routing waste. |
| Unsafe to Judge | High-risk workflow or missing quality/safety evidence prevents a safe conclusion. |
Use Keep / Revert / Narrow only when you have a confirmed cost-control change with comparable before/after evidence.
When the user says:
I suspect My_Agent got more expensive
The skill:
Preferred output:
Status: Watch
Likely interpretation:
One high-token direct chat is not enough to prove My_Agent became more expensive.
It may be a long conversation or accumulated session token count, not a routing regression.
Do this now:
Do not change routing yet. Check only recent direct sessions for My_Agent.
Copy-paste:
openclaw sessions --agent My_Agent --kind direct --limit 10 --json
If --kind is not supported, use:
openclaw sessions --agent My_Agent --limit 20 --json
and group by kind manually.
Do NOT say only "Not Comparable Yet."
Use:
Status: Watch
Likely interpretation:
The data shows mixed workloads, not a clean cost regression.
The high direct-chat token count may be caused by a long active conversation.
No routing change should be made from this evidence alone.
Do this now:
Compare only direct sessions first.
Copy-paste:
openclaw sessions --agent My_Agent --limit 20 --json
Then select only rows where kind = direct.
Use this only after you have a confirmed change and comparable evidence.
Trigger:
eval cost change after reducing retries from 4 to 2
Provide:
Before:
<paste summary>
After:
<paste summary>
Output:
Decision: Keep Change / Revert Change / Narrow Change / Watch / Unsafe to Judge
Before / After:
Cost signal:
Quality / reliability signal:
Recommendation:
Conditions to enter Full Eval Mode:
This skill is read-only.
It will not:
Users should redact sensitive data before pasting.