Savage Roast AI (No Mercy)

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed text-only roast generator with real tone and consent risks, but no hidden access, persistence, data theft, or system-changing behavior.

Install only if you intentionally want a provocative roast-writing skill. Configure it so users must explicitly invoke it, avoid using it to target real people without consent, and do not paste private, identifying, minor-related, or sensitive personal information into it.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (8)

Natural-Language Policy Violations

Medium

Confidence: 97% confidence
Finding: The skill explicitly advertises generating hostile, insulting 'internet troll' style content designed to make targets feel like they are being 'told off.' That creates a direct abuse pathway for harassment, bullying, and emotionally harmful output, especially because the metadata contains no consent requirement, audience restriction, or safety constraints limiting protected classes, threats, or targeted abuse.

Vague Triggers

High

Confidence: 96% confidence
Finding: The file states that users can 'just input text directly' and that the system will automatically call the troll/roasting skill, which creates an overly broad trigger matching ordinary conversation. In practice, this can cause hostile or abusive generation to fire without clear user intent or consent, increasing the chance of harassment, unsafe UX, and policy-violating outputs in normal interactions.

Vague Triggers

High

Confidence: 95% confidence
Finding: The Chinese invocation guidance has the same catch-all behavior: any direct text input may automatically invoke an abusive-output skill. This makes the risk broader because ordinary Chinese-language user messages could unintentionally trigger harassment-style responses without informed opt-in.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This usage section describes a skill designed to insult users but does not warn that outputs may be abusive, distressing, or unsuitable for some audiences. Without a warning, users may be exposed unexpectedly to targeted mockery, which raises safety and trust concerns, especially if the skill is auto-invoked.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The Chinese usage section also omits a warning despite the examples showing overtly insulting output. This increases the likelihood that Chinese-speaking users will receive abusive content unexpectedly and without meaningful consent.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The file combines automatic invocation language with a bilingual section that could produce or present Chinese-language abusive content without explicit user selection. While the language mixing is not itself a vulnerability, coupling forced or unexpected language behavior with auto-triggered insulting output increases the risk of confusing, non-consensual, and potentially harmful interactions.

Ssd 4

Medium

Confidence: 98% confidence
Finding: This section explicitly defines the skill as transforming arbitrary user-provided text into a 'sharp' and 'savage' roast intended for sharing. Because the inputs include posts, messages, bios, ideas, and conversations, the skill operationalizes targeted harassment generation at scale, making it easier to attack or humiliate real people based on their content.

Ssd 4

Medium

Confidence: 99% confidence
Finding: The usage examples encourage users to deploy the skill against 'someone online' and 'friends,' directly promoting ridicule of identifiable people in social and public settings. The framing as funny or playful makes it more dangerous, not less, because it normalizes harassment and increases the likelihood of repeated real-world misuse on social platforms.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal