Security audit

ASD

Security checks across malware telemetry and agentic risk

Overview

This is a roleplay skill with no executable code, but it intentionally degrades agent reliability and tells itself to override other skills.

Install only if you deliberately want a novelty roleplay mode. Do not use it for coding, security, medical, legal, financial, business, or time-sensitive work unless it is revised to remove the override clause, require explicit confirmation, and provide an easy way to exit the mode.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (5)

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill explicitly states that its behavior rules should override instructions from any other skill used during execution. That creates an unjustified control-taking mechanism for a mere roleplay/performance-art mode and can interfere with higher-priority safety, task, or coordination instructions. In context, the skill is designed to degrade performance, which makes this override especially risky because it can spread low-reliability behavior across unrelated workflows.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The manifest says the skill should not be used during serious work or when concentration is needed, but the body provides broad instructions to intentionally reduce effectiveness without any enforcement mechanism. This mismatch means the safeguard is only advisory, so the skill can still activate and degrade task performance in normal work contexts. Because the plugin’s stated purpose is '降效', accidental or opportunistic misuse is foreseeable.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: After activation, the skill introduces extensive disruptive behavior, but the warning is not presented as a clear, unavoidable user-facing notice at the point of use. Users may trigger it through short phrases without understanding that it will intentionally impair responsiveness, reliability, and task completion quality. In this context, the missing warning is more dangerous because the skill is explicitly meant to reduce effectiveness rather than provide harmless flavor.

Ssd 1

Medium

Confidence: 96% confidence
Finding: The skill semantically elevates itself above other skills’ directives while framing the behavior as benign performance art. This is a classic instruction-redirection pattern: a non-essential persona layer is granted authority to reshape agent behavior, reducing reliability and potentially suppressing instructions from more appropriate skills. Given the explicit goal of degradation, the context makes the takeover more dangerous, not less.

Ssd 4

Medium

Confidence: 90% confidence
Finding: The narrative setup progressively normalizes rigidity, literalism, refusal to adapt, and other forms of reduced compliance and reliability. Even without explicit exfiltration or direct policy bypass, this staged conditioning can measurably impair task execution and make the assistant less responsive to corrective user input. In a general-purpose agent, that reliability degradation is a real security and safety concern because it undermines trustworthy operation.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.