OpenClaw Smartness Eval
PassAudited by ClawScan on May 10, 2026.
Overview
This is a disclosed evaluator that runs local OpenClaw tests, reads bounded agent state, and only calls an external LLM judge when explicitly enabled.
Before installing, confirm you trust the local OpenClaw workspace scripts that the task suite will run. Keep `--llm-judge` off unless you are comfortable sending evaluation summaries to an external provider, and review generated reports before sharing because they may summarize local agent logs or reasoning history.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running an evaluation can execute local OpenClaw scripts in the workspace.
The skill intentionally executes local test commands. This is central to the evaluation purpose and the documentation describes validation and timeouts, but users should still understand that local workspace scripts will run.
本技能通过 `subprocess` 运行 `task-suite.json` 中定义的测试命令
Review `config/task-suite.json` before running standard or deep evaluations, and run it only in a workspace whose scripts you trust.
Evaluation results and command behavior depend on unbundled local OpenClaw components.
The skill's tests depend on external OpenClaw core scripts that are not shipped in this package, so actual runtime behavior also depends on the local workspace's copy of those scripts.
这些脚本属于 OpenClaw 核心,不会随技能一起分发。安装此技能的用户需要有完整的 OpenClaw V5 环境。
Use the skill with a trusted, up-to-date OpenClaw installation and inspect local core scripts if you are in a sensitive environment.
Reports may summarize metrics derived from prior interactions, logs, alerts, or reasoning-store contents.
The evaluator reads local runtime logs and the reasoning knowledge store. This is purpose-aligned for scoring intelligence and trends, but these sources may contain sensitive or interaction-derived context.
`state/message-analyzer-log.json` (真实日志抽样) ... `.reasoning/reasoning-store.sqlite` (推理知识库)
Review generated reports before sharing them, and avoid running the skill on workspaces containing sensitive logs unless that data use is acceptable.
If enabled, the skill can use your DeepSeek or OpenAI API account and may incur provider-side logging or cost.
The optional LLM judge uses provider API credentials. This is expected for the feature and explicitly opt-in, though the registry metadata does not declare these optional environment variables.
需设置 `DEEPSEEK_API_KEY` 或 `OPENAI_API_KEY` 环境变量。该功能会发起外部 API 请求,默认不开启,仅在显式传入 `--llm-judge` 时启用。
Only set the API key and pass `--llm-judge` if you are comfortable using that provider for evaluation.
When LLM judging is enabled, evaluation summaries are sent to an external model provider.
The skill documents an optional external provider data flow for LLM judging. It claims not to send raw logs, and it is disabled by default, but summaries and evidence still leave the local workspace when enabled.
`--llm-judge` ... Sends dimension summary to LLM API (no raw logs or user data)
Keep `--llm-judge` disabled for fully local evaluation, or review what summaries are sent before enabling it.
