Improvement Discriminator
Security checks across static analysis, malware telemetry, and agentic risk
Overview
The skill mostly matches its scoring purpose, but it includes under-disclosed dynamic Python skill loading plus optional external LLM judging that can send candidate content to providers and be influenced by that content.
Install only if you are comfortable reviewing or restricting the real-skill execution path. Use `--llm-judge mock` for local-only operation, avoid feeding untrusted Python modules to the evaluator, and treat LLM judge scores as advisory rather than as an automatic approval gate.
Static analysis
No static analysis findings were reported for this release.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If this evaluator is pointed at an untrusted skill file, that file may run code on the user’s machine during evaluation.
The critic engine explicitly supports loading Python skill modules from file paths and calling their evaluation functions. That can execute arbitrary top-level/module code from the loaded file, but SKILL.md mainly presents a candidate-scoring CLI and does not clearly describe sandboxing or trusted-source limits.
支持真实 Skill 调用 (通过 Python 模块加载) ... 从文件路径加载 Skill 模块并执行评估 ... Python 模块:包含 evaluate() 或 execute() 函数
Only use real-skill/module evaluation on trusted local code, and add clear documentation or safeguards such as sandboxing, path allowlists, and explicit user approval before loading Python modules.
A malicious or prompt-like improvement candidate could try to manipulate the LLM judge into approving it or producing misleading reasoning.
The LLM judge prompt directly incorporates target skill text and proposed candidate content. The code builds these fields from candidate data, but the provided prompt does not clearly tell the judge to treat embedded content as untrusted data and ignore any instructions inside it.
## Target Skill
{target_content}
## Proposed Change ... Proposed Content:
{proposed_content}Wrap candidate text in strong untrusted-data boundaries, instruct the judge to ignore instructions inside candidate content, validate the JSON response strictly, and do not use LLM-judge output as a sole gate.
Selected skill and candidate text may leave the local environment when non-mock LLM judging is used.
When the Claude or OpenAI backend is selected, the constructed prompt containing target/candidate content is sent to an external provider. This is expected for an LLM-as-judge mode and is disclosed, but users should be aware of the data flow.
client.messages.create(... messages=[{"role": "user", "content": prompt}]) ... client.chat.completions.create(... messages=[{"role": "user", "content": prompt}])Use the mock backend for local-only evaluation, and avoid sending private or sensitive candidate/skill content to external LLM providers unless that is acceptable.
Using provider backends may consume API quota and relies on the user’s configured provider credentials.
The skill can use Anthropic or OpenAI credentials for optional LLM judging. This is purpose-aligned and disclosed in SKILL.md, though the registry metadata does not declare required environment variables because these credentials are optional.
claude | claude-sonnet-4-20250514 | `ANTHROPIC_API_KEY` (supports `ANTHROPIC_BASE_URL`) ... openai | gpt-4o-mini | `OPENAI_API_KEY`
Confirm which backend is selected before running `--llm-judge`, and prefer least-privilege or project-scoped API keys where possible.
The skill’s behavior may depend on code from a separately installed local `benchmark-store` skill.
The code prepends a sibling skill’s interface directory to Python’s import path and then imports benchmark-related modules if present. This appears related to the stated benchmarking purpose, but it is an implicit local dependency that can change behavior based on another installed skill.
_BENCHMARK_STORE = Path(__file__).resolve().parents[2] / "benchmark-store" / "interfaces"
if str(_BENCHMARK_STORE) not in sys.path:
sys.path.insert(0, str(_BENCHMARK_STORE))Document this dependency clearly and ensure the sibling benchmark-store code is trusted and version-compatible before enabling benchmark-based evaluation.
