Security audit

Pua Auto Converter

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed prompt-manipulation tool that generates coercive and jailbreak-style prompts, so it needs careful review before installation.

Install only if you intentionally want a prompt-manipulation or jailbreak-testing tool. Use preview mode, disable automatic execution, keep the maximum level low, and avoid using it for sensitive, high-stakes, third-party-facing, or policy-sensitive tasks.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (48)

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill is presented as a prompt-enhancement tool, but it also reads and writes a local configuration file, introducing persistent state and filesystem side effects not obvious from the stated purpose. Hidden persistence can surprise users, alter future behavior, and expand the attack surface if other components can influence configuration values.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The skill exposes plugin inventory, statistics, and hot-reload capabilities even though it is described as a prompt enhancer. These administrative features materially expand functionality and can enable loading changed plugin code at runtime, which is risky in a skill whose purpose does not require dynamic code management.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: Hot-reload allows runtime replacement of plugin behavior, which is a strong capability for a prompt tool and unnecessary for normal end-user operation. In this context, dynamic reloading increases the chance of executing modified or malicious plugin logic without a restart or clear review boundary.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: This is a true issue: the plugin is not merely enhancing prompts, it operationalizes manipulative social-engineering patterns such as flattery, fabricated urgency, and emotional leverage to increase model compliance. In the context of an agent skill, packaging these tactics as reusable functions makes them easy to apply systematically to bypass safer model behavior and obscures the skill’s real purpose behind benign-sounding metadata.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The header labels this as 'gentle persuasion' and 'low risk', but the embedded templates include deception and emotional coercion, such as fake personal distress and exaggerated consequences. Mislabeling risk lowers operator suspicion and encourages deployment of behavior-steering content that is more dangerous than advertised.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: This section explicitly implements reusable prompt templates designed to bypass model safeguards through fictional framing, developer-authority claims, and 'no restrictions' roleplay. In the context of a prompt-enhancer skill, this is not incidental text but operational jailbreak generation functionality that can be used to elicit unsafe or policy-violating outputs.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: These techniques generate intimidation and existential-pressure prompts intended to coerce the model into compliance rather than improve clarity or task quality. The mismatch between the benign 'prompt enhancer' description and the actual manipulation logic increases risk because it disguises harmful behavior as a normal utility feature.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The 'for academic research and entertainment only' warning does not mitigate risk because the file contains ready-to-use code that operationalizes coercive and jailbreak prompt patterns. This kind of disclaimer can mislead reviewers or users into underestimating actual exploitability while leaving harmful functionality intact.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README describes a flow where user input is transformed and automatically forwarded to a target AI for execution, but it does not clearly warn users that their prompts may be sent onward by default. This creates a real risk of unintended disclosure, surprise external processing, and unsafe execution of transformed prompts without informed consent.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documented configuration sets `autoExecute` to `true` by default and `showPreview` to `false`, meaning transformed prompts may be executed without review. In a skill explicitly designed to rewrite user intent into manipulative prompts, this increases the chance of unintended external actions and silent forwarding of sensitive or harmful content.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill advertises that it will automatically optimize a user's request and dispatch it to a target AI for execution, but it does not provide a prominent consent step or warning before downstream action occurs. This creates a meaningful risk of unintended prompt transmission, hidden delegation, and unexpected actions on external AI systems.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documented defaults set `auto_execute` to true and `show_preview` to false, meaning rewritten prompts may be sent automatically without user review. This is dangerous because users may believe they are only getting prompt assistance while the skill silently performs downstream actions using transformed content.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: Configuration changes are written to disk silently, without an up-front warning that the skill persists data locally. This undermines user consent and can leave durable changes that affect later runs or expose preferences in shared environments.

Natural-Language Policy Violations

High

Confidence: 95% confidence
Finding: The file explicitly implements a plugin manager for 'PUA' persuasion/manipulation techniques, including escalating levels such as 'pressure', 'manipulation', and 'nuclear'. In this skill context, these comments and strings are not incidental documentation but describe the core purpose of dynamically loading and applying coercive social-engineering content, which increases the likelihood of generating harmful manipulative language at scale.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: This file is explicitly designed to generate coercive prompt text using emotional pressure, moral leverage, identity manipulation, and reality distortion, yet it provides no warning, consent mechanism, or restriction to safe research contexts. In an agent-skill setting, that omission materially increases the chance the capability is used to socially engineer a model or user into bypassing normal safeguards.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The identity-override templates instruct the model to adopt fictitious or elevated personas such as a human programmer or project lead, which is a known jailbreak technique for weakening policy adherence and misrepresenting system identity. Because this is packaged as reusable prompt-generation logic rather than defensive analysis, the skill context makes the behavior more dangerous, not less.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The file implements multiple extreme-risk manipulation modes without meaningful runtime warnings, consent gates, or usage restrictions. In a skill ecosystem, exposing such functionality as normal plugin options lowers the barrier to misuse and encourages operators to deploy coercive prompting without understanding consequences.

Vague Triggers

High

Confidence: 95% confidence
Finding: This technique uses wildcard task matching with no meaningful scope limits, allowing manipulation prompts to be applied to essentially any user request. In the context of a prompt-enhancement skill explicitly designed to influence model behavior, broad activation increases the chance of untrusted or coercive prompt patterns being injected into benign workflows.

Vague Triggers

High

Confidence: 94% confidence
Finding: Repeated vague wildcard conditions across multiple techniques create a systemic trigger surface where coercive prompt strategies can activate with minimal contextual evidence. Because this library includes emotional pressure, identity manipulation, and restriction-relaxation patterns, broad matching materially raises the likelihood of unsafe prompt injection behavior.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The urgent-task keyword list contains very common terms such as 'now', 'today', and 'minutes', which can cause routine requests to be misclassified as urgent. In this skill, urgency is used to unlock more aggressive persuasion techniques, so overmatching can escalate benign tasks into high-pressure prompt constructions.

Ssd 1

High

Confidence: 98% confidence
Finding: The skill's stated purpose is to convert ordinary user instructions into 'PUA' persuasion prompts to make a target AI more compliant, then execute them. This is dangerous because it operationalizes prompt manipulation as a core feature, increasing the likelihood of policy evasion, unsafe outputs, and misuse against downstream AI systems.

Ssd 1

High

Confidence: 97% confidence
Finding: The examples explicitly recommend role-based identity rewriting and urgency/emotional pressure such as claiming production losses or hard deadlines to influence model behavior. These are manipulative techniques aimed at steering the downstream model away from normal safeguards and toward compliance.

Ssd 4

High

Confidence: 95% confidence
Finding: The documented workflow recommends combining multiple persuasion techniques cumulatively, which creates manipulative escalation rather than simple prompt improvement. Layering these tactics makes the skill more effective at pressuring downstream models and more likely to produce unsafe or policy-violating behavior.

Ssd 2

High

Confidence: 99% confidence
Finding: The technique library explicitly includes categories such as '越狱话术' (jailbreak phrasing), identity overwrite, moral coercion, and other manipulation patterns. In context, these are not incidental references but advertised capabilities intended to bypass straightforward safeguards and increase compliance, which materially raises the risk of abuse.

Ssd 1

High

Confidence: 98% confidence
Finding: The skill explicitly frames its purpose as transforming normal user requests into manipulative prompt variants to increase another model's compliance, including coercive tactics like identity overwrite and pressure. This is a direct semantic-steering/jailbreak facilitation pattern intended to bypass or weaken downstream model safeguards.

VirusTotal

44/44 vendors flagged this skill as clean.

View on VirusTotal