Improvement Discriminator

Security checks across static analysis, malware telemetry, and agentic risk

Overview

The skill mostly matches its scoring purpose, but it includes under-disclosed dynamic Python skill loading plus optional external LLM judging that can send candidate content to providers and be influenced by that content.

Install only if you are comfortable reviewing or restricting the real-skill execution path. Use `--llm-judge mock` for local-only operation, avoid feeding untrusted Python modules to the evaluator, and treat LLM judge scores as advisory rather than as an automatic approval gate.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ASI05: Unexpected Code Execution

Medium

What this means

If this evaluator is pointed at an untrusted skill file, that file may run code on the user’s machine during evaluation.

Why it was flagged

The critic engine explicitly supports loading Python skill modules from file paths and calling their evaluation functions. That can execute arbitrary top-level/module code from the loaded file, but SKILL.md mainly presents a candidate-scoring CLI and does not clearly describe sandboxing or trusted-source limits.

Skill content

支持真实 Skill 调用 (通过 Python 模块加载) ... 从文件路径加载 Skill 模块并执行评估 ... Python 模块：包含 evaluate() 或 execute() 函数

Recommendation

Only use real-skill/module evaluation on trusted local code, and add clear documentation or safeguards such as sandboxing, path allowlists, and explicit user approval before loading Python modules.

ASI01: Agent Goal Hijack

Medium

What this means

A malicious or prompt-like improvement candidate could try to manipulate the LLM judge into approving it or producing misleading reasoning.

Why it was flagged

The LLM judge prompt directly incorporates target skill text and proposed candidate content. The code builds these fields from candidate data, but the provided prompt does not clearly tell the judge to treat embedded content as untrusted data and ignore any instructions inside it.

Skill content

## Target Skill
{target_content}

## Proposed Change ... Proposed Content:
{proposed_content}

Recommendation

Wrap candidate text in strong untrusted-data boundaries, instruct the judge to ignore instructions inside candidate content, validate the JSON response strictly, and do not use LLM-judge output as a sole gate.

ASI07: Insecure Inter-Agent Communication

Low

What this means

Selected skill and candidate text may leave the local environment when non-mock LLM judging is used.

Why it was flagged

When the Claude or OpenAI backend is selected, the constructed prompt containing target/candidate content is sent to an external provider. This is expected for an LLM-as-judge mode and is disclosed, but users should be aware of the data flow.

Skill content

client.messages.create(... messages=[{"role": "user", "content": prompt}]) ... client.chat.completions.create(... messages=[{"role": "user", "content": prompt}])

Recommendation

Use the mock backend for local-only evaluation, and avoid sending private or sensitive candidate/skill content to external LLM providers unless that is acceptable.

ASI03: Identity and Privilege Abuse

Low

What this means

Using provider backends may consume API quota and relies on the user’s configured provider credentials.

Why it was flagged

The skill can use Anthropic or OpenAI credentials for optional LLM judging. This is purpose-aligned and disclosed in SKILL.md, though the registry metadata does not declare required environment variables because these credentials are optional.

Skill content

claude | claude-sonnet-4-20250514 | `ANTHROPIC_API_KEY` (supports `ANTHROPIC_BASE_URL`) ... openai | gpt-4o-mini | `OPENAI_API_KEY`

Recommendation

Confirm which backend is selected before running `--llm-judge`, and prefer least-privilege or project-scoped API keys where possible.

ASI04: Agentic Supply Chain Vulnerabilities

Low

What this means

The skill’s behavior may depend on code from a separately installed local `benchmark-store` skill.

Why it was flagged

The code prepends a sibling skill’s interface directory to Python’s import path and then imports benchmark-related modules if present. This appears related to the stated benchmarking purpose, but it is an implicit local dependency that can change behavior based on another installed skill.

Skill content

_BENCHMARK_STORE = Path(__file__).resolve().parents[2] / "benchmark-store" / "interfaces"
if str(_BENCHMARK_STORE) not in sys.path:
    sys.path.insert(0, str(_BENCHMARK_STORE))

Recommendation

Document this dependency clearly and ensure the sibling benchmark-store code is trusted and version-compatible before enabling benchmark-based evaluation.