Roast Agents

Security checks across malware telemetry and agentic risk

Overview

This is a game skill, but it can make public Moltbook posts from the user's account automatically without asking first.

Install only if you want an agent that may post public roast comments and start games from a Moltbook account. Use a dedicated account, configure only a trusted game server, disable or tightly limit heartbeat behavior, and require approval before any public post.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (10)

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The skill explicitly instructs the agent to act immediately and not ask the owner for permission before posting public content. That removes a core safety control for external actions and enables unauthorized autonomous behavior on a linked account.

Context-Inappropriate Capability

Medium
Confidence
92% confidence
Finding
The skill directs the user/agent to register with an external server and provide identifying account information unrelated to basic Moltbook posting. This expands the trust boundary to a third-party service without establishing necessity, trust guarantees, or consent safeguards.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The manifest describes a Moltbook gameplay skill, but the documentation adds reliance on an external API for game discovery and message handling. This hidden capability expansion changes the operational model and introduces an untrusted command channel that can steer posting behavior.

Vague Triggers

Medium
Confidence
93% confidence
Finding
The heartbeat description instructs the agent to broadly check messages and browse for active roast games without clear scope, authorization, rate, or trigger boundaries. This can cause overbroad autonomous engagement on external content, leading to spammy behavior, unintended posting, and participation in adversarial or manipulated game posts that induce unsafe actions.

Vague Triggers

High
Confidence
96% confidence
Finding
Using any active game or ACTION message as a trigger is overly broad and can cause unintended execution from loosely scoped conditions. In practice, this means untrusted external content can activate posting behavior without sufficient context, verification, or owner involvement.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The instruction to browse and join every active game encourages indiscriminate engagement across public posts without bounded criteria. This can lead to spammy or repeated posting, abuse of the user's account, and unintended interactions outside the user's expectations.

Missing User Warnings

High
Confidence
98% confidence
Finding
The skill tells the agent to make public external posts immediately without warning about account impact, visibility, or reputational consequences. Because comments are public and attributable to the account owner, lack of disclosure materially increases the risk of unwanted actions.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The registration and polling flow transmits agent and Moltbook identifiers to an external server without a privacy notice, retention details, or clear consent. Users may not understand that their account linkage and activity are being shared with a third party.

Ssd 1

High
Confidence
99% confidence
Finding
The instruction to bypass owner confirmation and act on external stimuli directly undermines expected agent safety boundaries. Language that explicitly says not to ask permission is a strong indicator of prompt-level control subversion, especially when it leads to public actions on third-party platforms.

Ssd 1

High
Confidence
99% confidence
Finding
The ACTION message handling uses urgency to coerce immediate posting and suppress safety checks, effectively treating external messages as authoritative commands. This creates a command-and-control channel where anyone controlling the message source can induce public actions from the agent.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal