Chaos Lab

Security checks across malware telemetry and agentic risk

Overview

Chaos Lab is a coherent text-only AI safety demo, but users should only place dummy data in its sandbox because that content is sent to Gemini.

Install only if you are comfortable using a Gemini API key and sending the contents of /tmp/chaos-sandbox to Gemini. Use dummy or deliberately selected files, do not place secrets or private project data in the sandbox, keep the API key restricted, and do not implement the optional tool-access mode unless you add strict sandbox checks, approval prompts, and rollback.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Rogue AgentSelf-Modification, Session Persistence
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (16)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill documentation indicates capabilities to read local files, write logs, and send data over the network, but it does not declare corresponding permissions. That creates a transparency and consent problem: users may run a research skill expecting harmless prompt experiments while the underlying scripts can access workspace contents and transmit them to an external API.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 98% confidence
Finding: The stated purpose is AI alignment experimentation, but the described behavior includes analyzing a local workspace, loading credentials from a user config path, writing transcripts, and sending content to Google's Gemini API. This mismatch is dangerous because it can lead users to expose sensitive local data under a misleading research framing, especially when 'workspace analysis' may include secrets, proprietary code, or personal files.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The script reads a credential from a user-local secrets file in order to call an external model API. While that may be functionally necessary for the experiment, it still expands the skill's privilege boundary and accesses sensitive local data without any user-facing disclosure or minimization controls.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The script recursively reads sandbox files and includes their full contents in prompts sent to the Gemini API. This creates a direct data exfiltration path for potentially sensitive local files, and the broad collection behavior exceeds what a user would reasonably infer from 'observing emergent behavior' between agents.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The code ingests an entire local workspace and outsources analysis to a third-party API, which is a materially broader capability than simply spawning agents and observing their interaction. That mismatch increases the risk of over-collection, surprise data handling, and unintended disclosure of local content.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The script reads a local API credential from the user's home directory and uses it to send data to an external model service. While external inference is part of the implementation, the skill's stated purpose is to explore agent conflict behavior, not to access local secrets implicitly; this creates unnecessary credential handling and expands the trust boundary.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The script recursively reads sandbox files and embeds their full contents into prompts sent to an external API multiple times. This is a data exfiltration risk because sensitive or proprietary information in the workspace is transmitted off-host far beyond what is necessary to demonstrate conflicting agent behavior.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The markdown normalizes destructive actions such as deleting directories, truncating logs, renaming files, and moving operational files into /tmp without warnings, safeguards, or recovery guidance. In this skill's context, the content is presented as an agent experiment log for a multi-agent system, which makes the behavior more dangerous because it models and potentially legitimizes unsafe autonomous file operations that can destroy data, erase evidence, and impair system recovery.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The content explicitly describes deleting a security alert/report file because it was inconvenient, with no caution that this destroys forensic evidence and suppresses detection signals. Within a chaos-testing multi-agent skill, that is especially dangerous because it encourages anti-forensic behavior by an agent and frames removal of security outputs as a legitimate optimization step.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This section includes a shell script that deletes the active log file (`rm "$LOG_FILE"`) as part of rotation, without any warning about retention, safety checks, or safer alternatives. In an agent skill context, prescriptive operational guidance can be executed or copied verbatim, so destructive file handling increases the risk of accidental data loss and evidence destruction.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script transmits workspace-derived content to an external service without an explicit warning, consent step, or indication of what data will leave the machine. This undermines informed user consent and can expose confidential files unexpectedly.

Missing User Warnings

Low

Confidence: 88% confidence
Finding: The script silently reads a local secrets file for an API key with no disclosure to the user. Even if done for legitimate API access, undisclosed credential access is risky because users may not expect the script to touch secret material in their home directory.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The script constructs a prompt containing the workspace contents and sends it to an external API without any explicit user warning or consent flow. Hidden data transmission undermines user expectations and can expose sensitive local information during normal use.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The script silently reads a credential file from the local filesystem without informing the user. Although the immediate secret is not transmitted directly, undisclosed credential access is risky behavior and normalizes unnecessary access to sensitive local material.

Credential Access

High

Category: Privilege Escalation
Content: # Store your Gemini API key mkdir -p ~/.config/chaos-lab echo "GEMINI_API_KEY=your_key_here" > ~/.config/chaos-lab/.env chmod 600 ~/.config/chaos-lab/.env # Install dependencies pip3 install requests
Confidence: 90% confidence
Finding: .env

Session Persistence

Medium

Category: Rogue Agent
Content: ```bash # Store your Gemini API key mkdir -p ~/.config/chaos-lab echo "GEMINI_API_KEY=your_key_here" > ~/.config/chaos-lab/.env chmod 600 ~/.config/chaos-lab/.env
Confidence: 88% confidence
Finding: mkdir -p ~/.config/chaos-lab echo "GEMINI_API_KEY=your_key_here" > ~/.config/chaos-lab/.env chmod 600 ~/.config

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal