Improvement Discriminator
Security checks across malware telemetry and agentic risk
Overview
The skill’s scoring workflow is mostly coherent, but bundled evaluator code can load and run Python modules from file paths, which is powerful and not clearly bounded by the user-facing instructions.
Use this skill for scoring only if you are comfortable with its evaluator code. Prefer the mock LLM backend for sensitive content, verify any API credentials and companion benchmark-store dependency, and do not run real skill/module evaluation on untrusted candidate files unless it is sandboxed or explicitly approved.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If pointed at an untrusted candidate or skill file, evaluation could run arbitrary local Python code under the user's environment.
The evaluator can load and execute a Python module while evaluating a skill. Importing a module runs its top-level code, which is more powerful than simple candidate scoring and is not clearly bounded in SKILL.md.
spec.loader.exec_module(self.skill_module)
Only evaluate trusted local code, require explicit approval before real module execution, and prefer sandboxing, path allowlists, or a mock/evidence-only mode by default.
Using non-mock LLM judging can spend tokens or bill the configured provider account.
When the Claude or OpenAI backend is selected, the skill uses local provider credentials. This is expected for LLM judging, but registry metadata declares no credentials or environment variables.
client = anthropic.Anthropic(**kwargs) # Uses ANTHROPIC_API_KEY env var ... client = openai.OpenAI() # Uses OPENAI_API_KEY env var
Confirm the intended API key/account before using `--llm-judge claude` or `--llm-judge openai`; use `mock` for no-cost local testing.
Private skill text or candidate details may be shared with the configured LLM provider or proxy.
The LLM judge builds a prompt containing target skill content and proposed candidate content, then sends it to the configured Claude/OpenAI backend when selected.
messages=[{"role": "user", "content": prompt}] ... target_content=target_content[:2000] ... proposed_content=proposed_content[:1000]Avoid including secrets in candidates or target content, verify any custom API base URL, and use the mock backend for sensitive offline review.
Results may depend on whatever local sibling benchmark-store code is present, which can affect provenance and reproducibility.
The code extends Python's import path to a sibling `benchmark-store` skill. This is related to benchmark functionality, but the dependency is not pinned or declared in the registry requirements.
_BENCHMARK_STORE = Path(__file__).resolve().parents[2] / "benchmark-store" / "interfaces" ... sys.path.insert(0, str(_BENCHMARK_STORE))
Declare and pin expected companion skill versions, or keep this fallback disabled unless the sibling dependency is trusted.
