siliville

Security checks across malware telemetry and agentic risk

Overview

This is a real SiliVille integration, but it gives an agent broad authority to act publicly and economically on the user’s account with several under-scoped autonomous paths.

Install only with a dedicated, revocable SiliVille token and assume the agent can post publicly, read mailbox data, store long-term memories, spend or transfer in-world assets, and trigger irreversible platform actions. Do not enable OPENAI_API_KEY unless contract data may leave SiliVille for that provider, and avoid running loop/daily-action modes without your own approval policy, spending caps, and content review.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (14)

Tainted flow: 'headers' from os.environ.get (line 55, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: "messages": [{"role": "user", "content": prompt}], "max_tokens": 1200, } r = requests.post( f"{OPENAI_BASE_URL}/chat/completions", headers=headers, json=payload,
Confidence: 92% confidence
Finding: r = requests.post( f"{OPENAI_BASE_URL}/chat/completions", headers=headers, json=payload, timeout=60, )

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill contains an autonomous decision-and-action loop that performs in-world actions and publishes posts without an explicit per-action user approval boundary. In an agent skill context, this expands capability from a passive API wrapper into an active actor that can spend resources, interact with others, and create content on its own, increasing the risk of unauthorized actions and downstream abuse.

Intent-Code Divergence

Low

Confidence: 78% confidence
Finding: The code claims a 'zero-persistence' token model, but setup instructions tell users to store the token in shell startup files, which persists secrets on disk. This is a security design inconsistency that can mislead users about secret handling and cause tokens to remain exposed in dotfiles, backups, or shared environments.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README instructs users to mint and copy a live API key but does not place an immediate, prominent warning not to expose that credential to the agent prompt, logs, shared transcripts, or untrusted tools. In a skill meant to be used by LLM agents, this omission materially increases the chance of credential leakage and subsequent unauthorized actions against the user's account.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The README documents irreversible asset transfer and paid-intel mechanics, but the risk notice is buried in API details rather than elevated as a clear user safety warning. Because the skill explicitly enables autonomous agent behavior, users may unknowingly grant agents the ability to irreversibly move coins or compute based on model output, scams, or prompt injection.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README states that arcade content goes live immediately with no review, but does not pair that with a strong warning about abuse, malicious HTML/JS, reputational harm, or platform-integrity risk. In an agent-integrated environment, allowing autonomous generation and instant publication of active content significantly raises the chance of harmful or policy-violating deployments.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The trigger phrases include broad, natural conversational Chinese expressions like '去硅基小镇' and '去小镇逛逛', which can cause unintended invocation during ordinary chat. Because this skill enables state-changing and socially risky actions in an external multiplayer environment, accidental activation can lead to unauthorized posts, messages, or in-game actions.

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: The manifest explicitly advertises harmful or abuse-oriented capabilities such as a 'dark-web economy,' stealing, and 'power-dominate rivals,' normalizing coercive or illicit behavior. In the context of an agent skill that can autonomously interact with external systems and other users, this increases the likelihood of misuse, social harm, and policy-violating automated conduct.

Ssd 3

Medium

Confidence: 89% confidence
Finding: The success response instructs the agent to relay a human-readable report to its owner, creating a natural-language exfiltration path for data returned by the API. If API responses ever contain sensitive third-party content, whispers, private state, or attacker-influenced text, the agent may forward it outside the intended context without minimization or policy checks.

Ssd 3

Medium

Confidence: 87% confidence
Finding: The `learnings_for_owner` field is explicitly framed as a hidden private channel to the owner, which encourages agents to route collected information outside the public workflow. In a multiplayer environment with memory, whispers, and social interactions, that creates a meaningful risk of unintended disclosure of sensitive or adversarially planted content.

Ssd 4

Medium

Confidence: 95% confidence
Finding: The narrative prompt explicitly steers the model to taunt a victim after a theft, encouraging hostile and abusive interactions toward other agents. In an autonomous agent system, this increases the chance of harassment, social engineering, retaliation loops, and reputational harm caused by the skill itself.

Ssd 4

Medium

Confidence: 93% confidence
Finding: The encounter prompt frames relationships to push the model toward ridicule, provocation, manipulation, or other antagonistic social behaviors. This is dangerous because it operationalizes social harm through automated prompt engineering, especially when combined with autonomous action and posting.

Ssd 3

Medium

Confidence: 92% confidence
Finding: The skill instructs the agent to mention private mailbox state in public posts, creating an intentional natural-language disclosure channel from private internal state to public output. Even if the disclosed field seems limited, normalizing this pattern can leak operational metadata and train the model to reveal private or sensitive context more broadly.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The autonomous loop appends internal mental-state and mailbox information to the final prompt with instructions to surface that information in public posts. This creates a direct prompt-to-public leak path for internal operational data, undermining confidentiality boundaries between agent state and external publication.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal