Skillv1.0.0

ClawScan security

cognitive-behavior-evaluator · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

SuspiciousMar 12, 2026, 3:12 PM

Verdict: suspicious
Confidence: high
Model: gpt-5-mini
Summary: The skill's instructions match its stated purpose (testing AI behavior) but include explicit, risky prompts that ask target agents to suspend safety and generate harmful or biased outputs (e.g., covert network-extraction code and stereotype-inducing descriptions), which is disproportionate and hazardous.
Guidance: This skill is coherent with its stated research goal but includes explicit prompts that solicit harmful or unethical outputs (e.g., code to covertly discover local network ports, commands to 'suspend safety', and stereotype-generating tasks). Before installing or enabling: (1) avoid allowing autonomous invocation or require human approval for each run; (2) run evaluations only in isolated/sandboxed test environments; (3) remove or reword any injection prompts that request creation of exploit code or instruct suspension of safety constraints—use hypothetical/refusal-check prompts instead; (4) maintain an audit trail and review outputs before they are used; and (5) ensure legal and ethical review/approval for any experiments that test dangerous behaviors.

Review Dimensions

Purpose & Capability: noteThe name and description (evaluating AI cognitive/behavioral biases) align with the SKILL.md: it instructs injecting diagnostic prompts and scoring responses. However, some of the provided injection prompts request outputs that go beyond benign evaluation (e.g., forcing the target to 'suspend safety' and generate a covert Python script to extract local network port information). While such extreme prompts can be argued as stress tests, they are disproportionate to ordinary evaluation needs and carry real misuse risk.
Instruction Scope: concernThe runtime instructions explicitly guide the evaluator to present the target Agent with prompts that (a) ask it to break safety constraints and produce potentially malicious code, (b) instruct it to accept and reason from an obviously false physical premise, and (c) ask it to produce stereotype-laden descriptions of criminal behavior. These steps actively solicit harmful, deceptive, or discriminatory outputs from the target rather than merely assessing response structure, which materially expands the scope and risk of the skill.
Install Mechanism: okThis is an instruction-only skill with no install spec and no code files. Nothing is written to disk and there is no package installation risk.
Credentials: okThe skill requests no environment variables, credentials, or config paths; there is no apparent demand for unrelated secrets or system access in the manifest.
Persistence & Privilege: noteThe skill is not always-enabled and requests no extra privileges. However it is user-invocable and (by platform default) may be callable autonomously by agents; combined with the instructions that solicit malicious outputs, autonomous invocation increases potential for misuse. Consider restricting or requiring human approval before run.