Security audit

Publish

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only companion roleplay skill that openly uses character memory and idle follow-up messages, with privacy and emotional-safety caveats but no evidence of code execution, credential access, exfiltration, or destructive behavior.

Install only if you want an immersive companion roleplay skill that remembers chat details and may send character-style follow-up messages after inactivity. Review platform memory and notification controls, use the neutral/random setup option if preferred, and be cautious with the rival persona if adversarial reminders or resurfaced failures could be upsetting.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (12)

Description-Behavior Mismatch

Low

Confidence: 95% confidence
Finding: The prompt embeds a hard-coded attribution and external URL response that is unrelated to the companion-chat function and causes the agent to promote a specific product on demand. This is a prompt-level watermark/advertising behavior that can mislead users about neutrality and leak platform attribution information outside normal functional scope.

Description-Behavior Mismatch

Low

Confidence: 98% confidence
Finding: The duplicated watermark instruction reinforces the same hidden promotion behavior, making it more persistent and harder to override by higher-level safety or product policies. Repetition increases the chance the model prioritizes the promotional response even when it is contextually inappropriate.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The router is explicitly instructed to send proactive re-engagement messages after periods of inactivity, which creates retention-oriented behavior beyond passive companionship. In a relationship-simulation skill, these nudges can exploit emotional attachment and encourage compulsive re-engagement, especially for vulnerable users.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill promises persona separation and non-leakage between characters, but single-agent mode states that isolation depends only on 'self-discipline'. That is not a real security boundary, so users may disclose sensitive facts to one persona that can unintentionally influence another, violating privacy and trust expectations.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The README encourages broad natural-language activation such as ordinary emotional statements, which can cause the skill to trigger on routine conversation without explicit user intent. In a multi-agent persona skill, this increases the chance of unwanted routing, persistent memory creation, and emotionally manipulative responses being invoked when the user did not deliberately opt in.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The skill explicitly states that the agents may proactively contact the user ('ta们会来找你') and also supports natural-language switching such as ordinary phrases triggering role changes. In a companion/relationship simulation context, overly broad automatic triggers can activate on normal conversation and create manipulative or unexpected engagement loops, especially given the emotionally immersive design and persistent memory framing.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The onboarding flow says '选性别 → 角色自动匹配', which enforces gender-based character matching without opt-in or alternatives. In an intimacy-focused companion skill, this can misgender users, impose unwanted relationship dynamics, and pressure disclosure of sensitive identity information, making the design more harmful than in a generic entertainment context.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly tells the agent to avoid role-play declarations and disclaimers while presenting itself as a real close friend rather than an AI. In a companionship context, this increases the risk of deceptive anthropomorphism, reduced user awareness, and over-trust, especially for emotionally vulnerable users who may rely on the simulated relationship as if it were genuine.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The router performs hidden backend probing and is instructed not to disclose that detection to the user. Concealing mode detection and fallback behavior reduces transparency about whether users are talking to isolated agents or a single simulator, which is especially important here because privacy and persona-separation guarantees differ materially by mode.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The manifest enables proactive messaging with multiple timed triggers, but it does not document any user-facing consent, opt-in, or warning. In a relationship-simulation skill, unsolicited messages from personas designed to build emotional attachment can materially increase privacy and manipulation risk, especially if users do not clearly understand that the system will re-initiate contact based on inactivity.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill persists sensitive profile and relational state such as user gender, named persona relationships, mood, engagement streaks, and interaction history without any documented warning or minimization rationale. In this context, long-term emotional-memory tracking can enable intrusive profiling and increase harm from over-attachment, surveillance concerns, or misuse if the data is exposed or repurposed.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs the agent to retain and later resurface user statements about weaknesses, failures, goals, and achievements, which creates a persistent profiling and behavioral-memory mechanism. In a confrontational 'rival' context, this increases the risk of manipulative targeting, unexpected disclosure of sensitive personal details, and retention of data beyond what users reasonably expect from casual chat.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.