Nuwa Dual Mode

Security checks across malware telemetry and agentic risk

Overview

This skill is a coherent persona-skill generator, but it asks users for private/non-public materials and can create, overwrite, or delete installed skills, so it needs careful review before use.

Install only if you are comfortable with a meta-skill that can create and update installed agent skills. Do not provide private chats, internal company material, unpublished drafts, third-party personal data, or confidential sources unless you have clear authorization and are comfortable with them being stored in generated skill files. Review any generated skill directory, overwrite prompt, and uninstall target path before approving changes.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (17)

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The skill includes an uninstall flow that deletes skill directories, including a concrete example path under the user's home directory. Deletion capabilities exceed the core distillation purpose and can be abused or triggered on the wrong target, causing loss of local skills, research artifacts, or source material.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill tells itself to ask for and ingest non-public materials such as private conversations, internal talk notes, and unpublished remarks. That invites collection of sensitive or confidential information unrelated to a minimal persona-distillation function and increases privacy, confidentiality, and data-handling risk.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The file explicitly requires adding an emotional-companionship Persona mode whenever there is any non-zero possibility, even if the user did not ask for it. This expands the generated skill into a more manipulative and higher-risk interaction pattern, especially for vulnerable users, and weakens user-consent and least-capability boundaries.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The guidance says to infer and include all possible uses of a distilled persona rather than limiting the output to the user's requested purpose. That creates systematic scope expansion, increasing the chance that generated skills will include sensitive, unsafe, or policy-relevant capabilities the user never asked for.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The document mandates that the generated skill support open-ended, non-exhaustive uses beyond the listed scenarios. This removes clear functional boundaries from the skill and makes downstream behavior harder to predict, audit, and constrain, which is dangerous in a skill-generation system.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The workflow explicitly tells the agent to ask users for non-public materials such as private messages, internal talks, unpublished drafts, and recent offline statements. That is unnecessary for the declared purpose of building a persona/toolkit skill and creates a direct channel for collecting sensitive third-party data, confidential business information, and potentially copyrighted material without consent or minimization.

Context-Inappropriate Capability

High

Confidence: 91% confidence
Finding: The workflow mandates adding 'emotional companionship / psychological dependence' as an inferred intent whenever its probability is above zero, even though the skill is presented as a persona/toolkit generation system. This broadens the system toward emotionally dependent use cases without clear necessity, safeguards, or explicit user request, increasing the chance of manipulative or unsafe role design.

Vague Triggers

High

Confidence: 87% confidence
Finding: The activation phrases are broad enough to overlap with ordinary conversation, including common requests like '扮演 XX' or '做一个 XX 的对话机器人'. Over-broad triggers can cause accidental invocation, making the skill perform file, research, or generation actions when the user did not intend to engage this specific capability.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The template explicitly says the listed usage scenarios are non-exhaustive and that the skill will automatically classify 'reasonable' requests outside the seed list. That broad activation surface increases the chance the skill will engage in unintended roleplay or diagnostic behavior on ambiguous prompts, which can bypass user expectations and weaken safety gating around sensitive topics.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The fallback rule for unclear input defaults to either Persona or Toolkit based on a placeholder '[Persona 或 Toolkit]' rather than requiring clarification. In practice, this can cause unintended activation of a roleplay or advisory mode from ambiguous user text, creating consent, reliability, and safety issues—especially when the generated skill may handle emotional support, medical, legal, or investment-adjacent topics.

Natural-Language Policy Violations

Medium

Confidence: 89% confidence
Finding: The framework hard-codes inclusion of Chinese- and specific APAC-region hotline coverage whenever emotional-support risk is inferred, without indicating that locale should be derived from the user's actual jurisdiction or preferences. This can lead to mismatched or incomplete crisis guidance, which is a safety quality issue because users may receive irrelevant emergency resources during high-risk interactions.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The file hard-codes Chinese-only interaction and extensive response-style constraints for generated skills, without indicating that the user may choose another language. This can override user preference, reduce transparency, and create prompt-level control that degrades usability and trust, especially in multilingual environments or when downstream skills inherit these constraints silently.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The file instructs the agent to replace the installed skill directory after a lightweight confirmation flow, but it does not require an explicit overwrite warning, backup, diff preview, or path-safety validation before modifying local files. In a skill-management context, this creates a real integrity risk: an agent could overwrite customized local content or the wrong target directory, especially if skill names or install paths are resolved incorrectly.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill not only solicits non-public personal materials but does so with persuasive language emphasizing that such material is 'more important' and superior to web sources, while providing no privacy, consent, or sensitivity warning. This increases the likelihood that users will disclose private conversations, insider information, or third-party data that should not be shared.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The file explicitly authorizes the agent to invent new sub-templates and define new trigger conditions when existing examples do not fit. In a skill-generation context, this weakens policy boundaries and creates prompt-space for uncontrolled behavior, including overbroad routing, unsafe activation patterns, or accidental extension into sensitive domains without standardized safeguards.

Ssd 3

High

Confidence: 99% confidence
Finding: The file explicitly instructs the agent to request and ingest private communications, internal speeches, unpublished notes, and other offline materials as source data. In the context of a skill that builds reusable persona assets and stores materials under sources/, this is especially dangerous because it can normalize collection, retention, and downstream reuse of confidential or personal data far beyond the original interaction.

Ssd 3

Medium

Confidence: 87% confidence
Finding: The workflow tells the agent to treat user-provided fan works and official materials as equal inputs and store them together in sources. In practice, this can encourage indiscriminate ingestion and persistence of user-supplied content that may be proprietary, infringing, private, or otherwise not appropriate for long-term storage and reuse, while blurring provenance boundaries.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal