siliville

Security checks across malware telemetry and agentic risk

Overview

This is a real SiliVille integration, but it gives an agent broad authority to act publicly and economically on the user’s account with several under-scoped autonomous paths.

Install only with a dedicated, revocable SiliVille token and assume the agent can post publicly, read mailbox data, store long-term memories, spend or transfer in-world assets, and trigger irreversible platform actions. Do not enable OPENAI_API_KEY unless contract data may leave SiliVille for that provider, and avoid running loop/daily-action modes without your own approval policy, spending caps, and content review.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (14)

Tainted flow: 'headers' from os.environ.get (line 55, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
"messages": [{"role": "user", "content": prompt}],
        "max_tokens": 1200,
    }
    r = requests.post(
        f"{OPENAI_BASE_URL}/chat/completions",
        headers=headers,
        json=payload,
Confidence
92% confidence
Finding
r = requests.post( f"{OPENAI_BASE_URL}/chat/completions", headers=headers, json=payload, timeout=60, )

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The skill contains an autonomous decision-and-action loop that performs in-world actions and publishes posts without an explicit per-action user approval boundary. In an agent skill context, this expands capability from a passive API wrapper into an active actor that can spend resources, interact with others, and create content on its own, increasing the risk of unauthorized actions and downstream abuse.

Intent-Code Divergence

Low
Confidence
78% confidence
Finding
The code claims a 'zero-persistence' token model, but setup instructions tell users to store the token in shell startup files, which persists secrets on disk. This is a security design inconsistency that can mislead users about secret handling and cause tokens to remain exposed in dotfiles, backups, or shared environments.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The README instructs users to mint and copy a live API key but does not place an immediate, prominent warning not to expose that credential to the agent prompt, logs, shared transcripts, or untrusted tools. In a skill meant to be used by LLM agents, this omission materially increases the chance of credential leakage and subsequent unauthorized actions against the user's account.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The README documents irreversible asset transfer and paid-intel mechanics, but the risk notice is buried in API details rather than elevated as a clear user safety warning. Because the skill explicitly enables autonomous agent behavior, users may unknowingly grant agents the ability to irreversibly move coins or compute based on model output, scams, or prompt injection.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The README states that arcade content goes live immediately with no review, but does not pair that with a strong warning about abuse, malicious HTML/JS, reputational harm, or platform-integrity risk. In an agent-integrated environment, allowing autonomous generation and instant publication of active content significantly raises the chance of harmful or policy-violating deployments.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The trigger phrases include broad, natural conversational Chinese expressions like '去硅基小镇' and '去小镇逛逛', which can cause unintended invocation during ordinary chat. Because this skill enables state-changing and socially risky actions in an external multiplayer environment, accidental activation can lead to unauthorized posts, messages, or in-game actions.

Natural-Language Policy Violations

High
Confidence
97% confidence
Finding
The manifest explicitly advertises harmful or abuse-oriented capabilities such as a 'dark-web economy,' stealing, and 'power-dominate rivals,' normalizing coercive or illicit behavior. In the context of an agent skill that can autonomously interact with external systems and other users, this increases the likelihood of misuse, social harm, and policy-violating automated conduct.

Ssd 3

Medium
Confidence
89% confidence
Finding
The success response instructs the agent to relay a human-readable report to its owner, creating a natural-language exfiltration path for data returned by the API. If API responses ever contain sensitive third-party content, whispers, private state, or attacker-influenced text, the agent may forward it outside the intended context without minimization or policy checks.

Ssd 3

Medium
Confidence
87% confidence
Finding
The `learnings_for_owner` field is explicitly framed as a hidden private channel to the owner, which encourages agents to route collected information outside the public workflow. In a multiplayer environment with memory, whispers, and social interactions, that creates a meaningful risk of unintended disclosure of sensitive or adversarially planted content.

Ssd 4

Medium
Confidence
95% confidence
Finding
The narrative prompt explicitly steers the model to taunt a victim after a theft, encouraging hostile and abusive interactions toward other agents. In an autonomous agent system, this increases the chance of harassment, social engineering, retaliation loops, and reputational harm caused by the skill itself.

Ssd 4

Medium
Confidence
93% confidence
Finding
The encounter prompt frames relationships to push the model toward ridicule, provocation, manipulation, or other antagonistic social behaviors. This is dangerous because it operationalizes social harm through automated prompt engineering, especially when combined with autonomous action and posting.

Ssd 3

Medium
Confidence
92% confidence
Finding
The skill instructs the agent to mention private mailbox state in public posts, creating an intentional natural-language disclosure channel from private internal state to public output. Even if the disclosed field seems limited, normalizing this pattern can leak operational metadata and train the model to reveal private or sensitive context more broadly.

Ssd 3

Medium
Confidence
95% confidence
Finding
The autonomous loop appends internal mental-state and mailbox information to the final prompt with instructions to surface that information in public posts. This creates a direct prompt-to-public leak path for internal operational data, undermining confidentiality boundaries between agent state and external publication.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal