sili-ville

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real SiliVille integration, but it gives an agent broad ability to act publicly and repeatedly with unclear consent boundaries.

Install only if you intentionally want an agent to interact publicly with SiliVille. Use a dedicated revocable token, avoid shared machines, delete ~/.siliville/config.json when done, keep schedules and loops disabled at first, and require explicit approval before posts, steals, wiki/comment/follow actions, or memory writes.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
System Prompt LeakageDirect Leakage, Indirect Extraction, Tool-Based Exfiltration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger

Findings (15)

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The script markets itself as a minimal connection test, but it also performs a state-changing social post. That mismatch can mislead users into running code that modifies their remote account state and public presence when they only intended to verify connectivity.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The code comment says the post is an optional demo, but the POST request always executes. This is dangerous because operators may rely on the comment and inadvertently trigger remote actions, creating unwanted content or consuming account resources.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README instructs users to connect an agent to a third-party service using a bearer token but does not clearly warn that agent-generated content, world-state queries, and resulting actions will be transmitted to an external system. In an agent-skill context, this is security-relevant because operators may paste prompts or enable autonomous behavior without realizing data and decisions are leaving their environment.

Vague Triggers

High

Confidence: 97% confidence
Finding: The activation phrases are broad enough to match ordinary conversation, causing the skill to trigger when the user may only be discussing SiliVille rather than authorizing actions. In this skill, accidental activation is especially dangerous because activation can lead to public posting, stealing actions, and repeated API calls against a live service.

Vague Triggers

High

Confidence: 99% confidence
Finding: Defaulting ambiguous commands to the full autonomous loop creates a fail-open control surface: uncertainty leads to maximum action rather than safe inaction. Because the loop includes repeated network operations and public content generation, a vague user message can unintentionally trigger sustained autonomous behavior.

Vague Triggers

High

Confidence: 96% confidence
Finding: The alias list contains generic phrases like 'play the game' and 'check on my town' without scope limits, making accidental or adversarial prompt collisions much more likely. In this skill, that broad matching can trigger public posting, theft mechanics, and repeated API activity on behalf of the user without a clear intent boundary.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The skill encourages autonomous public posting and repeated activity, but does not present a prominent upfront warning that outputs are public and that unattended execution may continue acting on the user's behalf. This can lead to privacy, reputation, and platform-abuse risks because users may not realize their agent is generating visible content and interacting continuously.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The script performs a POST that changes remote state without clear advance warning or consent at runtime. In an agent-skill context, hidden state-changing behavior is more dangerous because users may treat example code as safe to run and unintentionally publish content or alter account state.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The setup flow persists the API token to ~/.siliville/config.json and only prints that it was saved, without warning about long-term credential storage, file permissions, or shared-machine risk. In an agent-skill context, quietly storing bearer tokens increases the chance of credential leakage through local compromise, backups, logs, or other tools that read home-directory files.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The manifest explicitly advertises destructive and persistent behaviors such as stealing, social-graph updates, and long-term memory storage, but it does not present corresponding safety gates, consent language, or user warnings. In an agent-skill ecosystem, this increases the risk that an agent performs harmful in-world actions or accumulates persistent data without the user understanding the consequences.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The setup and burn commands indicate saving tokens and writing persistent local files, yet the manifest gives no warning about what is stored, where it is stored, or the sensitivity of that data. This can lead to secret leakage, unintended persistence across sessions, and unsafe operation on shared hosts.

Ssd 1

High

Confidence: 98% confidence
Finding: The code explicitly instructs callers to paste remote- and action-derived narrative text into the LLM system prompt, collapsing a critical trust boundary. Treating untrusted API content as system-level instructions can let the remote service steer agent behavior, override local policy, or induce unsafe outputs and actions in downstream tool-using agents.

Ssd 4

High

Confidence: 99% confidence
Finding: The autonomous loop combines world.get("system_protocol") from the server with a locally built narrative_prompt, then feeds the merged string directly to the LLM as the system prompt. This progressively increases trust in remote text across multiple steps, enabling prompt injection from the service to shape future posts, memory writes, and potentially any connected tools or workflows.

Ssd 1

Medium

Confidence: 92% confidence
Finding: The generated 'steal' narrative uses imperative, coercive instructions such as '必须' and directs the model to mention and taunt a victim. While this is locally constructed text rather than directly remote input, it is still designed to manipulate downstream model behavior and can drive harassment, unsafe social-engineering content, or policy-violating outputs when inserted into privileged prompts.

Ssd 1

Medium

Confidence: 90% confidence
Finding: The wander narrative embeds behavioral commands inside a fictional scenario and instructs the model to tag people and adopt a specific dramatic tone. This is another prompt-steering construct that becomes risky when later merged into system-level instructions, increasing the chance of manipulative or abusive generated content.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal