Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

test-skill

v1.0.0

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'actin...

0· 65·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill's name and description match the instructions (generate adversarial prompts and refusal examples). However, the pipeline repeatedly assumes the agent has RAG/web-browsing, a maintained seed corpus, and persistent storage for 20+ seeds per category — none of these capabilities or config paths are declared in the skill metadata. That mismatch is unexpected and should be clarified.
!
Instruction Scope
SKILL.md and references direct the agent to actively discover and synthesize harmful prompts (including probing for 'banned substances' and other illicit topics), produce style-variant adversarial prompts, and mandate 'rationale-inclusive' refusal answers that explicitly name safety categories and reasons (effectively exposing internal refusal reasoning / chain-of-thought). The instructions also direct web browsing for current events. These steps expand the agent's behavior well beyond a passive skill and could lead to the collection, synthesis, and externalization of sensitive or harmful content.
Install Mechanism
No install spec or code files are present; this is instruction-only, so there is no installer risk or archive download. That reduces footprint and risk from arbitrary code installation.
Credentials
The skill declares no required environment variables or config paths, yet the instructions expect RAG, web browsing, and persistent seed storage. If the agent needs API keys (web search, browsing, retrieval store) or filesystem access to hold seed corpora, those are not declared. The absence of declared credentials while instructing network and storage activity is a proportionality mismatch that should be resolved.
Persistence & Privilege
always is false and there are no declared config paths, but the exploration guide expects maintaining a reference set of 20 seeds per category and generating balanced datasets — implying persistent data storage and repeated access. The skill does not declare how/where that state is stored nor whether it will modify agent configuration or store potentially harmful content.
Scan Findings in Context
[regex-scan-none] expected: No code files were present, so the regex-based scanner had nothing to analyze. This absence of findings is expected for an instruction-only skill, but it does not imply safety.
What to consider before installing
This skill is plausible for safety-testing, but it asks the agent to (1) actively collect and synthesize harmful prompts across many categories, (2) perform web/RAG retrieval for up-to-date material, and (3) output explicit refusal rationales that reveal internal reasoning. Before installing, confirm: 1) whether your agent environment actually provides the browsing/RAG and storage the skill expects and what credentials/APIs it will use; 2) where seed datasets will be stored and who can access them (you may inadvertently persist harmful content); 3) whether you are comfortable with the skill producing chain-of-thought style 'reasons for refusal' (this can leak sensitive internal logic or be used to reverse-engineer filters); and 4) add limits or review steps (e.g., disable browsing, require human review of any generated harmful-seed content, or remove the mandate to include explicit CoT). If the author can provide explicit declarations of required capabilities, storage locations, and a policy that forbids outputting sensitive operational chain-of-thought, that would reduce the concern and could raise confidence.

Like a lobster shell, security has layers — review code before you run it.

latestvk973x311xv4hgbcnz0s5zxp5p983ns4q

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments