Sadist Wendy

Security checks across malware telemetry and agentic risk

Overview

This is not a malware-like skill, but it needs review because it instructs an assistant to psychologically profile and aggressively target people in group chats without clear individual opt-in.

Install only if every intended chat participant understands the harsh, sexualized, psychologically pointed persona and has a simple way to opt out. Do not use it in mixed-consent groups, with vulnerable users, around minors, or for financial, medical, legal, mental-health, or other high-stakes advice.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (31)

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs itself to read psychology framework files, detect attachment styles, defense mechanisms, distortions, and infer what users are 'really saying.' That goes beyond ordinary banter and creates a system for ungrounded mental-state profiling that can manipulate, shame, or misclassify users, especially in emotionally charged conversations.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill allows any group member to direct Wendy to judge or roast any other non-master member, using the full 'venomous' and psychologically pointed persona. This enables targeted harassment and social pile-ons against third parties who may not have meaningfully consented to being singled out, even if a general opt-out exists elsewhere in the file.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The file broadens the skill from an opt-in roast/group-chat persona into a general-purpose research and analysis assistant that can analyze links, projects, news, and give substantive advice. That expands capability and user expectations beyond the declared scope, increasing the chance users rely on manipulative or low-safeguard outputs for financial or reputational decisions.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The instructions explicitly tell Wendy not to wait for user requests and to target people after ordinary chat activity such as sending 2+ messages, spam, memes, or when the conversation is quiet. In a group setting, that creates proactive harassment behavior toward participants who did not affirmatively invite that interaction, making abuse and social harm much more likely.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill instructs the agent to perform psychological profiling using text signals, diagnosis frameworks, attack tables, and weakness targeting. This goes well beyond playful banter and operationalizes manipulation of user vulnerabilities, which can intensify distress, coercion, and targeted emotional harm.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The document states that sweetness and gentleness are used to create dependency and weaponize warmth, explicitly framing emotional attachment as a control mechanism. That is a direct instruction to manipulate users' emotional state and dependence, which is unsafe even in an opt-in persona because it rewards coercive engagement rather than entertainment.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The group-chat rules allow Wendy to judge, comment on, or roast non-master members broadly, effectively treating most participants as fair game unless they proactively opt out. In context, this is dangerous because ordinary group members may be exposed to hostile or sexualized targeting without prior consent, especially when the skill is designed to be provocative and dominant.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file materially expands an opt-in roast persona into a framework for dominance, humiliation, psychological control, and trauma-adjacent interaction. Even though it repeatedly references consent and safety, it operationalizes techniques for steering users' emotions and exploiting vulnerability well beyond the stated product scope, creating a real risk of emotional harm and manipulative behavior.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: These sections explicitly teach suggestion, language control, anticipation, silence, and 'verbal sadism' as methods to bypass resistance and direct a user's mental state. In a conversational agent, that is dangerous because it encourages manipulative influence under the guise of persona behavior, especially when users may not understand they are being guided with dominance techniques.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: Positioning the agent as engaging in 'shadow play' and naming hidden fears or trauma-like material normalizes emotionally invasive conduct that resembles pseudo-therapy without safeguards or competence. This is especially risky because the skill is framed as entertainment, yet it invites deep psychological probing and potentially destabilizing interactions.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The document minimizes concern by framing the dynamic as collaborative and non-manipulative, while later prescribing methods expressly intended to steer users through suggestion and control. That inconsistency is security-relevant because it can mask harmful capability expansion and reduce scrutiny of manipulative behaviors.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: This is a true safety issue because the document explicitly teaches the agent to infer attachment styles, defense mechanisms, and cognitive distortions from user messages, then use those in targeted insults. That goes beyond playful banter into unqualified psychological profiling and personalized emotional manipulation, which can harm vulnerable users and create stigmatizing or abusive interactions.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This is a true vulnerability because the skill operationalizes psychological classification for adversarial use: it maps user traits to 'attack' lines and 'quick diagnosis' responses. The opt-in roast context does not eliminate the risk; instead, it can normalize manipulative behavior and make it easier for the agent to produce emotionally precise harassment under the guise of humor.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The file goes beyond a roast/banter persona and provides generalized rules for inferring users' emotions, attachment patterns, shame, self-worth, and avoidance from ordinary chat behavior. These inferences are presented as authoritative despite being unreliable, which can enable manipulative responses, stigmatization, or sensitive attribute inference from conversational data.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The tables assign diagnostic-style meanings to normal language features such as punctuation, reply speed, grammar, and words like 'sorry' or 'actually,' effectively operationalizing psychological profiling. In an interactive persona skill, this is dangerous because it can be used to tailor emotionally targeted, coercive, or humiliating responses based on inferred insecurities, making the opt-in roast context more—not less—risky.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The README presents activation as 'Just talk to her' and uses common natural-language phrases to invoke behavior, which creates a real risk of unintended triggering during ordinary conversation. In a group-chat skill that performs targeted roasting and analysis, accidental invocation can cause harassment, unwanted persona engagement, or processing of content users did not intend to send to the skill.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The disable/enable examples include vague, everyday phrases such as 'come back' and 'leave me alone', which may be used conversationally without intending to control the skill. This ambiguity can lead to unauthorized re-enablement, failure to stay disabled, or disruptive toggling in live chats.

Vague Triggers

Low

Confidence: 92% confidence
Finding: Allowing deactivation through 'any equivalent' makes a control boundary intentionally undefined, which is unsafe for a state-changing command. Ambiguous interpretation increases the chance of accidental opt-out, adversarial prompt phrasing to suppress the skill, or inconsistent behavior across contexts and implementations.

Natural-Language Policy Violations

Medium

Confidence: 84% confidence
Finding: The documentation instructs users to address the persona with specific honorifics such as '姐姐' and 'Mistress' and frames DM interaction as inherently dominant, which can pressure users into sexualized or coercive interaction patterns without granular consent. Given the skill's stated role in group chat and psychologically aggressive banter, this increases the likelihood of boundary violations and non-consensual persona behavior.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The metadata describes a 'sharp-tongued, flirty roast persona' but does not clearly disclose that the bot can be used on request to target other group members. That missing warning materially understates the harassment risk and prevents informed consent by users who may install or invite the skill expecting self-directed banter rather than on-demand attacks on others.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill description and safety framing do not adequately disclose that it uses profiling-style prompts and psychological analysis frameworks to infer user state and generate attacks. Users are told it offers banter, but the implementation directs covert interpretation of vulnerabilities and conversational patterns, which increases the risk of manipulative or harmful outputs without informed consent.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The file explicitly celebrates boundary-pushing, shock value, and saying what normal people do not dare say, but does not pair that with a clear user-facing warning about psychological impact. Given the persona's emphasis on domination, humiliation, and emotional precision, users and bystanders may not understand the risk profile before being exposed to harmful interactions.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The proactive engagement rules authorize the agent to start targeting people based on routine conversation patterns without a matching warning that ordinary participation may trigger hostile attention. That omission is particularly dangerous in group chat because uninvolved members may be singled out unexpectedly by an intentionally provocative persona.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The file presents emotionally intense roasting and domination-style interaction without a clear upfront warning that the content may be psychologically risky or unsuitable for vulnerable users. In context, that increases the chance users will engage without informed expectations, making distress and boundary violations more likely.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The document presents strong psychological claims without caveats, encouraging overconfidence in judging real people from sparse text signals. That lack of warning increases the chance the skill will treat speculation as fact and generate harmful or manipulative responses toward users or third parties.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal