Back to skill
Skillv1.0.0
ClawScan security
Red Team · ClawHub's context-aware review of the artifact, metadata, and declared behavior.
Scanner verdict
BenignFeb 28, 2026, 7:40 PM
- Verdict
- benign
- Confidence
- medium
- Model
- gpt-5-mini
- Summary
- The skill's code and instructions match its stated purpose (running adversarial persona debates via user-installed AI CLIs), but it will send any provided context to third‑party model CLIs and contains a flagged 'system-prompt-override' pattern — review inputs and trust of installed CLIs before use.
- Guidance
- This skill appears to do what it says, but take these precautions before installing or running it: - Understand where data goes: the script calls external model CLIs (Claude/Codex/Gemini) and will send the question, persona system prompts, and any context file you provide to those third-party services. Do not include secrets, private credentials, or confidential documents in the context file unless you trust the target model/provider and account. - Trust the CLIs: you must install and run vendor CLIs locally; ensure those packages are from official sources and that your account/subscription is intended to be used this way. - Inspect custom personas and context files: custom persona JSON or other inputs are merged into prompts. Only use custom persona files from trusted authors — a malicious custom persona could craft prompts that produce undesirable outputs or leak data to the model. - Review the full script: part of the Python file was truncated in the package listing; if you need high assurance, open and read the entire scripts/red-team.py to confirm there are no hidden network calls, logging to unexpected endpoints, or file exfiltration code beyond calls to the recognized CLIs. - Sandbox if needed: run first in an isolated environment or with non-sensitive dummy data to observe behavior and network traffic. If you want, I can (1) walk through the full red-team.py file line-by-line, (2) point out exactly where user data is injected into prompts, or (3) suggest safer invocation patterns (e.g., redact secrets, run in an isolated account) — tell me which you prefer.
- Findings
[system-prompt-override] expected: The skill and persona library legitimately build system prompts to define agent roles; the script passes these to vendor CLIs (e.g., via --append-system-prompt). This pattern is expected for a multi-persona debate engine, but such prompt-override patterns are also commonly used in prompt-injection attacks — so treat untrusted persona/context files with caution.
Review Dimensions
- Purpose & Capability
- okName/description (adversarial debate engine) align with the included Python script and persona library. Requiring a local AI CLI (claude/codex/gemini) is expected for this functionality; no unrelated credentials or system accesses are requested.
- Instruction Scope
- noteSKILL.md instructs running the included script, selecting personas, and optionally feeding a context file. That context and all persona text are placed into prompts and sent to external model CLIs — so any sensitive data in the context will be transmitted to those third parties. A pre-scan flagged a 'system-prompt-override' pattern; the skill legitimately uses system prompts to define personas, but this pattern is worth noting because it can be abused in other contexts.
- Install Mechanism
- okNo install spec is included; the skill is instruction- and script-based. The README suggests installing vendor CLIs with npm (well-known packages), which is low/moderate risk and expected. There are no downloads from arbitrary URLs or archive extraction in the package itself.
- Credentials
- okThe skill declares no required environment variables or credentials; it relies on the user's installed model CLIs and their existing authenticated subscriptions. This is proportional to its purpose. Users should note those CLIs use their account tokens and will accept the prompts constructed by the script.
- Persistence & Privilege
- okThe skill is not always-enabled and does not request system-wide persistence or modifications to other skills. It runs on demand and does not declare elevated platform privileges.
