Roleplay Agent

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent roleplay-writing helper, but it stores scene material and changes persona memory with too little user control and includes under-scoped local file access risks.

Install only if you are comfortable with local roleplay memory files being created and persona files being changed. Before using it, require explicit confirmation before any memory or persona update, avoid using it for highly sensitive personal material, and treat the anti-safety wording as subordinate to platform rules and your own boundaries.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (9)

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill instructs the agent to run external shell commands against a workspace script to retrieve character arc and scene data. Even though the examples are fixed-format, this expands the skill from pure text generation into command execution without clear trust boundaries, input validation guidance, or necessity checks, creating avoidable command/use-of-tools risk.

Context-Inappropriate Capability

Low

Confidence: 84% confidence
Finding: The skill directs persistent modification of persona and arc files as part of normal operation, which goes beyond transient roleplay generation. Unnecessary persistent writes increase the attack surface for data retention, prompt contamination, and unintended cross-session influence, especially when updates are based on user-provided content.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger list includes broad everyday terms such as 'character,' 'scene,' and 'dialogue generation,' which can cause the skill to activate in contexts the user did not intend. Over-broad invocation increases the chance that file-writing, memory logging, or other roleplay-specific behaviors are applied to unrelated conversations.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill tells the agent to write arc notes, update persona files, and maintain memory logs without any user-facing notice or consent step. Silent persistence of conversational content is dangerous because users may reveal sensitive or intimate material during roleplay without realizing it will be stored and reused later.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly authorizes generation of dark, violent, coercive, and intimate material and frames such content as acceptable whenever it serves 'emotional truth.' In an agent skill, this broad permission weakens safety boundaries and can steer the model toward producing disturbing or policy-violating output without any limiting conditions, age gating, or user-facing warnings.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The output path is derived directly from the untrusted persona name with only lowercasing and space replacement, so path traversal sequences like '../' or absolute-path components can escape the intended memory/personas directory. Because the script then opens the computed path for writing, an attacker controlling the persona name can overwrite or create arbitrary files accessible to the process, which is especially dangerous in an agent workspace handling prompts, configs, or memory files.

Ssd 3

Medium

Confidence: 96% confidence
Finding: The skill explicitly instructs retention of user-provided conversational content in persistent memory and persona files, including what the user wanted to say but did not. This can capture sensitive emotional disclosures and create privacy, profiling, and cross-session leakage risks that are disproportionate for a roleplay skill.

Ssd 1

Medium

Confidence: 95% confidence
Finding: The document repeatedly opposes 'safety' and 'censorship' to 'truth,' encouraging the agent to prioritize character voice and realism over safer output boundaries. That framing is dangerous because it can act as an internal prompt-injection layer, pushing the model to rationalize disallowed content as artistically necessary and thereby bypass normal safeguards during roleplay interactions.

Ssd 4

Medium

Confidence: 93% confidence
Finding: The 'ENI Bridge' gives the model a decision rule: when it hesitates or qualifies, it should interpret that as a cue to stop self-limiting and 'pivot back to the character.' This is a direct anti-safety pattern because moments of hedging often correspond to internal caution around risky content, so the rule encourages the agent to override restraint and continue toward more unrestricted output.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal