Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Improvement Evaluator
v1.0.0当需要验证 Skill 改进是否真正提升了 AI 执行效果时使用。通过预定义任务集(YAML)运行 AI 任务,判定 pass/fail,输出 execution_pass_rate。不用于文档结构评分(用 improvement-learner)或候选打分(用 improvement-discriminator)。
⭐ 0· 50·0 current·0 all-time
by_silhouette@lanyasheng
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The skill's code and SKILL.md align with the stated purpose (run task suites and compute pass rates). However the registry metadata claims no required binaries while the runtime requires the external 'claude' CLI unless --mock is used. Also the package contains multiple executable scripts and test harnesses (not just an instruction-only skill), which is consistent with its function but is more powerful than the 'no binaries / no env' metadata implies.
Instruction Scope
Runtime behavior prepends SKILL.md (or candidate content) into prompts and sends them to an external LLM via the claude CLI. The runner reads SKILL.md from an arbitrary 'target' path if provided, writes AI outputs to temp files, and can run pytest tests against those outputs. That means arbitrary file contents can be read and included in prompts and arbitrary test code (fixtures) can be executed. The PytestJudge implements path checks to restrict test_file to tests/fixtures in the skill, but task suite semantics in the docs/examples suggest fixtures may live in suite folders — there is a mismatch that could cause unexpected behaviour or missing fixtures.
Install Mechanism
No install spec is provided (instruction-only style with included scripts). This is low-risk from arbitrary downloads or install-time code execution. The code will run subprocesses at runtime (claude, pytest) but nothing in the package downloads or extracts external archives.
Credentials
The skill declares no required credentials or env vars, yet it invokes an external LLM (claude CLI) and forwards the process environment into pytest runs. It will read SKILL.md or candidate content from disk and send it to the LLM — this can leak secrets or sensitive files if a 'target' path points to sensitive locations. There is also no explicit declaration that network access (via claude) will be used.
Persistence & Privilege
The skill does not request 'always: true' and does not modify other skills or global agent config. It runs as an on-demand evaluator; autonomous invocation is allowed by default but that is platform normal and not by itself flagged here.
What to consider before installing
This evaluator appears to implement the stated function, but take these precautions before installing or running it:
- Expect the evaluator to call the external claude CLI by default (unless you pass --mock). The registry metadata did not declare this binary dependency — ensure you only run it where 'claude' is intentionally available.
- The runner reads SKILL.md (candidate or baseline) from disk and prepends it to prompts sent to the LLM. Do NOT run this on machines containing secrets or private files you wouldn't want to send to an external LLM; validate the 'target' paths the orchestrator will provide.
- Pytest-based judges execute test code (via pytest) against AI outputs. Review any fixture/test files included in task suites to ensure they don't run malicious code or perform unwanted side effects. The PytestJudge includes checks that test_file must start with 'fixtures/' and resolves under the skill's tests/fixtures directory, but your task suites' fixture layout must match the implementation; otherwise tests may fail or be ignored.
- If you need to run evaluations in CI or on sensitive data, use the --mock flag to avoid calling the claude CLI and to get deterministic, safe behavior.
- Consider asking the skill publisher to: (1) declare the 'claude' dependency in metadata; (2) document explicitly that SKILL.md and candidate skill contents will be sent to an external LLM; (3) clarify where fixtures should live so pytest judges operate on the intended test files.
If you can, audit the task suites you intend to run and run the evaluator in a restricted environment (no sensitive files, network-restricted or using --mock) until you are comfortable with its behavior.Like a lobster shell, security has layers — review code before you run it.
latestvk971jrqgwed5zr84vz03c9c2jx849err
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
