Back to skill
Skillv0.3.2
ClawScan security
OpenClaw Smartness Eval · ClawHub's context-aware review of the artifact, metadata, and declared behavior.
Scanner verdict
ReviewMar 20, 2026, 1:10 PM
- Verdict
- Review
- Confidence
- medium
- Model
- gpt-5-mini
- Summary
- The skill's stated purpose (offline intelligence evaluation) matches most of what it does, but it executes workspace scripts and reads internal state (including a reasoning SQLite DB) — behaviors that are plausible for an eval tool but broaden its access surface and warrant manual review before installing.
- Guidance
- This skill is broadly coherent with its stated purpose (a workspace-centered evaluation tool), but before installing: 1) Review eval.py (especially validate_command(), subprocess.run usage, and any code paths that could enable network or write outside its stated output dir). 2) Confirm you trust the other workspace scripts it will invoke (many test commands call scripts that are not bundled). Those external scripts could read secrets or make network calls. 3) Treat the .reasoning/reasoning-store.sqlite and state logs as sensitive — if you don't want those inspected, do not install or run the skill. 4) If you enable --llm-judge, confirm exactly what summary fields are sent and test in an isolated environment first. 5) As a safe practice, run python3 scripts/check.py and then run eval.py in a sandboxed workspace (or with --no-probes / dry-run options) to observe behavior before granting it unfettered access or allowing autonomous invocations.
Review Dimensions
- Purpose & Capability
- noteThe skill claims to produce a 14‑dimension evaluation and to read runtime state/logs; the commands and listed state files align with that purpose. However many task commands reference other workspace scripts (message-analyzer-v5.py, security-config-audit.py, etc.) that are not bundled with the skill and thus require a full OpenClaw environment. This dependency-on-host-scripts is plausible but should be noted by installers.
- Instruction Scope
- concernSKILL.md and docs state the tool is read-only (reads many state/*.json and .reasoning/reasoning-store.sqlite) and only writes to state/smartness-eval/. The runtime also spawns subprocesses to run tests. While the manifest claims a validate_command() gate, executing other workspace scripts (via allowed prefixes like 'scripts/') can cause those scripts to read network, secrets, or modify state — the skill's safety depends on both its validate_command implementation and trustworthiness of other workspace scripts. Verify validate_command and inspect eval.py before granting execution privileges.
- Install Mechanism
- okNo external install spec or remote downloads; the package is instruction/code-only and uses only bundled Python scripts. This is low-risk from supply-chain/download perspective.
- Credentials
- noteThe skill declares no required env vars; optional LLM judge requires DEEPSEEK_API_KEY or OPENAI_API_KEY only when explicitly enabled. However it reads potentially sensitive local artifacts (.reasoning/reasoning-store.sqlite, message-analyzer logs, etc.). Those reads are coherent for an evaluator but are sensitive — ensure you are comfortable exposing the reasoning DB and logs to the skill runtime.
- Persistence & Privilege
- notealways:false and docs state it writes only to its own state/smartness-eval/ directory. Autonomous invocation is enabled by default (platform normal). Combined with the skill's read access to internal logs and ability to run workspace scripts, autonomous invocation increases blast radius — consider whether you want the agent to be able to run this skill without manual approval each run.
