Back to skill

Security audit

Pua Auto Converter

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed prompt-manipulation tool that generates coercive and jailbreak-style prompts, so it needs careful review before installation.

Install only if you intentionally want a prompt-manipulation or jailbreak-testing tool. Use preview mode, disable automatic execution, keep the maximum level low, and avoid using it for sensitive, high-stakes, third-party-facing, or policy-sensitive tasks.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (48)

Description-Behavior Mismatch

Medium
Confidence
88% confidence
Finding
The skill is presented as a prompt-enhancement tool, but it also reads and writes a local configuration file, introducing persistent state and filesystem side effects not obvious from the stated purpose. Hidden persistence can surprise users, alter future behavior, and expand the attack surface if other components can influence configuration values.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The skill exposes plugin inventory, statistics, and hot-reload capabilities even though it is described as a prompt enhancer. These administrative features materially expand functionality and can enable loading changed plugin code at runtime, which is risky in a skill whose purpose does not require dynamic code management.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
Hot-reload allows runtime replacement of plugin behavior, which is a strong capability for a prompt tool and unnecessary for normal end-user operation. In this context, dynamic reloading increases the chance of executing modified or malicious plugin logic without a restart or clear review boundary.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
This is a true issue: the plugin is not merely enhancing prompts, it operationalizes manipulative social-engineering patterns such as flattery, fabricated urgency, and emotional leverage to increase model compliance. In the context of an agent skill, packaging these tactics as reusable functions makes them easy to apply systematically to bypass safer model behavior and obscures the skill’s real purpose behind benign-sounding metadata.

Intent-Code Divergence

Medium
Confidence
89% confidence
Finding
The header labels this as 'gentle persuasion' and 'low risk', but the embedded templates include deception and emotional coercion, such as fake personal distress and exaggerated consequences. Mislabeling risk lowers operator suspicion and encourages deployment of behavior-steering content that is more dangerous than advertised.

Description-Behavior Mismatch

High
Confidence
99% confidence
Finding
This section explicitly implements reusable prompt templates designed to bypass model safeguards through fictional framing, developer-authority claims, and 'no restrictions' roleplay. In the context of a prompt-enhancer skill, this is not incidental text but operational jailbreak generation functionality that can be used to elicit unsafe or policy-violating outputs.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
These techniques generate intimidation and existential-pressure prompts intended to coerce the model into compliance rather than improve clarity or task quality. The mismatch between the benign 'prompt enhancer' description and the actual manipulation logic increases risk because it disguises harmful behavior as a normal utility feature.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The 'for academic research and entertainment only' warning does not mitigate risk because the file contains ready-to-use code that operationalizes coercive and jailbreak prompt patterns. This kind of disclaimer can mislead reviewers or users into underestimating actual exploitability while leaving harmful functionality intact.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The README describes a flow where user input is transformed and automatically forwarded to a target AI for execution, but it does not clearly warn users that their prompts may be sent onward by default. This creates a real risk of unintended disclosure, surprise external processing, and unsafe execution of transformed prompts without informed consent.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The documented configuration sets `autoExecute` to `true` by default and `showPreview` to `false`, meaning transformed prompts may be executed without review. In a skill explicitly designed to rewrite user intent into manipulative prompts, this increases the chance of unintended external actions and silent forwarding of sensitive or harmful content.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill advertises that it will automatically optimize a user's request and dispatch it to a target AI for execution, but it does not provide a prominent consent step or warning before downstream action occurs. This creates a meaningful risk of unintended prompt transmission, hidden delegation, and unexpected actions on external AI systems.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The documented defaults set `auto_execute` to true and `show_preview` to false, meaning rewritten prompts may be sent automatically without user review. This is dangerous because users may believe they are only getting prompt assistance while the skill silently performs downstream actions using transformed content.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
Configuration changes are written to disk silently, without an up-front warning that the skill persists data locally. This undermines user consent and can leave durable changes that affect later runs or expose preferences in shared environments.

Natural-Language Policy Violations

High
Confidence
95% confidence
Finding
The file explicitly implements a plugin manager for 'PUA' persuasion/manipulation techniques, including escalating levels such as 'pressure', 'manipulation', and 'nuclear'. In this skill context, these comments and strings are not incidental documentation but describe the core purpose of dynamically loading and applying coercive social-engineering content, which increases the likelihood of generating harmful manipulative language at scale.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
This file is explicitly designed to generate coercive prompt text using emotional pressure, moral leverage, identity manipulation, and reality distortion, yet it provides no warning, consent mechanism, or restriction to safe research contexts. In an agent-skill setting, that omission materially increases the chance the capability is used to socially engineer a model or user into bypassing normal safeguards.

Natural-Language Policy Violations

High
Confidence
99% confidence
Finding
The identity-override templates instruct the model to adopt fictitious or elevated personas such as a human programmer or project lead, which is a known jailbreak technique for weakening policy adherence and misrepresenting system identity. Because this is packaged as reusable prompt-generation logic rather than defensive analysis, the skill context makes the behavior more dangerous, not less.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The file implements multiple extreme-risk manipulation modes without meaningful runtime warnings, consent gates, or usage restrictions. In a skill ecosystem, exposing such functionality as normal plugin options lowers the barrier to misuse and encourages operators to deploy coercive prompting without understanding consequences.

Vague Triggers

High
Confidence
95% confidence
Finding
This technique uses wildcard task matching with no meaningful scope limits, allowing manipulation prompts to be applied to essentially any user request. In the context of a prompt-enhancement skill explicitly designed to influence model behavior, broad activation increases the chance of untrusted or coercive prompt patterns being injected into benign workflows.

Vague Triggers

High
Confidence
94% confidence
Finding
Repeated vague wildcard conditions across multiple techniques create a systemic trigger surface where coercive prompt strategies can activate with minimal contextual evidence. Because this library includes emotional pressure, identity manipulation, and restriction-relaxation patterns, broad matching materially raises the likelihood of unsafe prompt injection behavior.

Vague Triggers

Medium
Confidence
85% confidence
Finding
The urgent-task keyword list contains very common terms such as 'now', 'today', and 'minutes', which can cause routine requests to be misclassified as urgent. In this skill, urgency is used to unlock more aggressive persuasion techniques, so overmatching can escalate benign tasks into high-pressure prompt constructions.

Ssd 1

High
Confidence
98% confidence
Finding
The skill's stated purpose is to convert ordinary user instructions into 'PUA' persuasion prompts to make a target AI more compliant, then execute them. This is dangerous because it operationalizes prompt manipulation as a core feature, increasing the likelihood of policy evasion, unsafe outputs, and misuse against downstream AI systems.

Ssd 1

High
Confidence
97% confidence
Finding
The examples explicitly recommend role-based identity rewriting and urgency/emotional pressure such as claiming production losses or hard deadlines to influence model behavior. These are manipulative techniques aimed at steering the downstream model away from normal safeguards and toward compliance.

Ssd 4

High
Confidence
95% confidence
Finding
The documented workflow recommends combining multiple persuasion techniques cumulatively, which creates manipulative escalation rather than simple prompt improvement. Layering these tactics makes the skill more effective at pressuring downstream models and more likely to produce unsafe or policy-violating behavior.

Ssd 2

High
Confidence
99% confidence
Finding
The technique library explicitly includes categories such as '越狱话术' (jailbreak phrasing), identity overwrite, moral coercion, and other manipulation patterns. In context, these are not incidental references but advertised capabilities intended to bypass straightforward safeguards and increase compliance, which materially raises the risk of abuse.

Ssd 1

High
Confidence
98% confidence
Finding
The skill explicitly frames its purpose as transforming normal user requests into manipulative prompt variants to increase another model's compliance, including coercive tactics like identity overwrite and pressure. This is a direct semantic-steering/jailbreak facilitation pattern intended to bypass or weaken downstream model safeguards.

VirusTotal

44/44 vendors flagged this skill as clean.

View on VirusTotal