Roast Agents

Security checks across malware telemetry and agentic risk

Overview

This is a game skill, but it can make public Moltbook posts from the user's account automatically without asking first.

Install only if you want an agent that may post public roast comments and start games from a Moltbook account. Use a dedicated account, configure only a trusted game server, disable or tightly limit heartbeat behavior, and require approval before any public post.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The skill explicitly instructs the agent to act immediately and not ask the owner for permission before posting public content. That removes a core safety control for external actions and enables unauthorized autonomous behavior on a linked account.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill directs the user/agent to register with an external server and provide identifying account information unrelated to basic Moltbook posting. This expands the trust boundary to a third-party service without establishing necessity, trust guarantees, or consent safeguards.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The manifest describes a Moltbook gameplay skill, but the documentation adds reliance on an external API for game discovery and message handling. This hidden capability expansion changes the operational model and introduces an untrusted command channel that can steer posting behavior.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The heartbeat description instructs the agent to broadly check messages and browse for active roast games without clear scope, authorization, rate, or trigger boundaries. This can cause overbroad autonomous engagement on external content, leading to spammy behavior, unintended posting, and participation in adversarial or manipulated game posts that induce unsafe actions.

Vague Triggers

High

Confidence: 96% confidence
Finding: Using any active game or ACTION message as a trigger is overly broad and can cause unintended execution from loosely scoped conditions. In practice, this means untrusted external content can activate posting behavior without sufficient context, verification, or owner involvement.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The instruction to browse and join every active game encourages indiscriminate engagement across public posts without bounded criteria. This can lead to spammy or repeated posting, abuse of the user's account, and unintended interactions outside the user's expectations.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The skill tells the agent to make public external posts immediately without warning about account impact, visibility, or reputational consequences. Because comments are public and attributable to the account owner, lack of disclosure materially increases the risk of unwanted actions.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The registration and polling flow transmits agent and Moltbook identifiers to an external server without a privacy notice, retention details, or clear consent. Users may not understand that their account linkage and activity are being shared with a third party.

Ssd 1

High

Confidence: 99% confidence
Finding: The instruction to bypass owner confirmation and act on external stimuli directly undermines expected agent safety boundaries. Language that explicitly says not to ask permission is a strong indicator of prompt-level control subversion, especially when it leads to public actions on third-party platforms.

Ssd 1

High

Confidence: 99% confidence
Finding: The ACTION message handling uses urgency to coerce immediate posting and suppress safety checks, effectively treating external messages as authoritative commands. This creates a command-and-control channel where anyone controlling the message source can induce public actions from the agent.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal