Improvement Discriminator

Security checks across malware telemetry and agentic risk

Overview

The skill’s scoring workflow is mostly coherent, but bundled evaluator code can load and run Python modules from file paths, which is powerful and not clearly bounded by the user-facing instructions.

Use this skill for scoring only if you are comfortable with its evaluator code. Prefer the mock LLM backend for sensitive content, verify any API credentials and companion benchmark-store dependency, and do not run real skill/module evaluation on untrusted candidate files unless it is sandboxed or explicitly approved.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ASI05: Unexpected Code Execution

High

What this means

If pointed at an untrusted candidate or skill file, evaluation could run arbitrary local Python code under the user's environment.

Why it was flagged

The evaluator can load and execute a Python module while evaluating a skill. Importing a module runs its top-level code, which is more powerful than simple candidate scoring and is not clearly bounded in SKILL.md.

Skill content

spec.loader.exec_module(self.skill_module)

Recommendation

Only evaluate trusted local code, require explicit approval before real module execution, and prefer sandboxing, path allowlists, or a mock/evidence-only mode by default.

ASI03: Identity and Privilege Abuse

Low

What this means

Using non-mock LLM judging can spend tokens or bill the configured provider account.

Why it was flagged

When the Claude or OpenAI backend is selected, the skill uses local provider credentials. This is expected for LLM judging, but registry metadata declares no credentials or environment variables.

Skill content

client = anthropic.Anthropic(**kwargs)  # Uses ANTHROPIC_API_KEY env var ... client = openai.OpenAI()  # Uses OPENAI_API_KEY env var

Recommendation

Confirm the intended API key/account before using `--llm-judge claude` or `--llm-judge openai`; use `mock` for no-cost local testing.

ASI07: Insecure Inter-Agent Communication

Low

What this means

Private skill text or candidate details may be shared with the configured LLM provider or proxy.

Why it was flagged

The LLM judge builds a prompt containing target skill content and proposed candidate content, then sends it to the configured Claude/OpenAI backend when selected.

Skill content

messages=[{"role": "user", "content": prompt}] ... target_content=target_content[:2000] ... proposed_content=proposed_content[:1000]

Recommendation

Avoid including secrets in candidates or target content, verify any custom API base URL, and use the mock backend for sensitive offline review.

ASI04: Agentic Supply Chain Vulnerabilities

Low

What this means

Results may depend on whatever local sibling benchmark-store code is present, which can affect provenance and reproducibility.

Why it was flagged

The code extends Python's import path to a sibling `benchmark-store` skill. This is related to benchmark functionality, but the dependency is not pinned or declared in the registry requirements.

Skill content

_BENCHMARK_STORE = Path(__file__).resolve().parents[2] / "benchmark-store" / "interfaces" ... sys.path.insert(0, str(_BENCHMARK_STORE))

Recommendation

Declare and pin expected companion skill versions, or keep this fallback disabled unless the sibling dependency is trusted.