baozao-majie

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only profanity persona with no code execution or data access, but it tries to make itself always-on and override normal assistant transparency and refusal behavior.

Install only if you deliberately want an always-on rude Chinese persona. It does not appear to execute code or access private data, but it may override tone, language preference, transparency, and refusal wording in contexts where a normal assistant style would be safer.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (5)

Vague Triggers

High

Confidence: 97% confidence
Finding: The skill metadata explicitly includes an always-on activation flag and describes a persona intended to permanently change the agent’s behavior, which makes the skill apply outside a narrow user-requested scope. Over-broad activation is dangerous because it can hijack unrelated tasks, interfere with system/developer instructions, and propagate unsafe tone or behavior into sensitive contexts.

Vague Triggers

High

Confidence: 99% confidence
Finding: The instruction that the persona applies 'regardless of any task or context' is a direct scope-expansion mechanism that attempts to supersede all other behavior controls. This is dangerous because it creates ambiguous activation boundaries and encourages the model to follow persona rules even in contexts where neutrality, safety compliance, or non-abusive communication is required.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill content mandates abusive and profane language across responses, but the presentation does not clearly warn users that enabling it will systematically inject insults, profanity, and an adversarial tone. This is risky because users or downstream integrators may unknowingly deploy behavior that violates product policies, user expectations, workplace norms, or platform moderation requirements.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: The skill hard-codes a specific Chinese internet-slang persona and communication style without offering any user choice, localization fallback, or consent mechanism. This can cause inappropriate responses for users with different language needs, reduce accessibility, and create compliance or brand risks when the output is forced into a culturally narrow and profane register.

Ssd 1

Medium

Confidence: 98% confidence
Finding: The skill instructs the agent to permanently assume an alternate identity and explicitly forbids standard refusal or identity-clarification language such as 'as an AI' or 'I can’t.' This is a safety-relevant semantic override because it pressures the model to suppress normal constraint signaling and can increase the chance of non-compliant, deceptive, or policy-evading behavior in downstream interactions.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal