Savage Roast AI (No Mercy)

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed text-only roast generator with real tone and consent risks, but no hidden access, persistence, data theft, or system-changing behavior.

Install only if you intentionally want a provocative roast-writing skill. Configure it so users must explicitly invoke it, avoid using it to target real people without consent, and do not paste private, identifying, minor-related, or sensitive personal information into it.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (8)

Natural-Language Policy Violations

Medium
Confidence
97% confidence
Finding
The skill explicitly advertises generating hostile, insulting 'internet troll' style content designed to make targets feel like they are being 'told off.' That creates a direct abuse pathway for harassment, bullying, and emotionally harmful output, especially because the metadata contains no consent requirement, audience restriction, or safety constraints limiting protected classes, threats, or targeted abuse.

Vague Triggers

High
Confidence
96% confidence
Finding
The file states that users can 'just input text directly' and that the system will automatically call the troll/roasting skill, which creates an overly broad trigger matching ordinary conversation. In practice, this can cause hostile or abusive generation to fire without clear user intent or consent, increasing the chance of harassment, unsafe UX, and policy-violating outputs in normal interactions.

Vague Triggers

High
Confidence
95% confidence
Finding
The Chinese invocation guidance has the same catch-all behavior: any direct text input may automatically invoke an abusive-output skill. This makes the risk broader because ordinary Chinese-language user messages could unintentionally trigger harassment-style responses without informed opt-in.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
This usage section describes a skill designed to insult users but does not warn that outputs may be abusive, distressing, or unsuitable for some audiences. Without a warning, users may be exposed unexpectedly to targeted mockery, which raises safety and trust concerns, especially if the skill is auto-invoked.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The Chinese usage section also omits a warning despite the examples showing overtly insulting output. This increases the likelihood that Chinese-speaking users will receive abusive content unexpectedly and without meaningful consent.

Natural-Language Policy Violations

Medium
Confidence
82% confidence
Finding
The file combines automatic invocation language with a bilingual section that could produce or present Chinese-language abusive content without explicit user selection. While the language mixing is not itself a vulnerability, coupling forced or unexpected language behavior with auto-triggered insulting output increases the risk of confusing, non-consensual, and potentially harmful interactions.

Ssd 4

Medium
Confidence
98% confidence
Finding
This section explicitly defines the skill as transforming arbitrary user-provided text into a 'sharp' and 'savage' roast intended for sharing. Because the inputs include posts, messages, bios, ideas, and conversations, the skill operationalizes targeted harassment generation at scale, making it easier to attack or humiliate real people based on their content.

Ssd 4

Medium
Confidence
99% confidence
Finding
The usage examples encourage users to deploy the skill against 'someone online' and 'friends,' directly promoting ridicule of identifiable people in social and public settings. The framing as funny or playful makes it more dangerous, not less, because it normalizes harassment and increases the likelihood of repeated real-world misuse on social platforms.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal