Adversarial Engine

Security checks across malware telemetry and agentic risk

Overview

This skill matches its debate-engine purpose, but it needs review because it automatically runs AI-generated Python locally and exposes network and data-handling behavior with weak controls.

Install only after careful review. Run it in an isolated environment, bind any server to localhost unless explicitly needed, remove and rotate the embedded API key, verify external key-router code, disable code execution unless a real sandbox is added, and avoid using sensitive topics or knowledge-base files because prompts and retrieved content can be sent to an external LLM and stored locally.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (18)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: temp_path = f.name try: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True,
Confidence: 98% confidence
Finding: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

subprocess module call

Medium

Category: Dangerous Code Execution
Content: temp_path = f.name try: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True,
Confidence: 98% confidence
Finding: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill advertises capabilities implying file access, networking, and shell-like/code execution behavior, but declares no permissions or trust boundaries. This is dangerous because users and hosting systems cannot accurately assess or constrain what the skill may do, especially given the documented Python sandbox, vector retrieval, WebSocket service, and persistence features.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented behavior understates or omits materially risky functions: external HTTP/WebSocket exposure, persistent storage of prompts/outputs/code, hardcoded default API keys, and weaker-than-claimed safety controls. This mismatch is dangerous because operators may deploy the skill under false assumptions, exposing sensitive data, enabling unauthorized access, and trusting nonexistent safeguards such as real vector retrieval or true automatic circuit breaking.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The class is labeled as a sandbox, but it only wraps subprocess.run with a timeout and /tmp working directory. That is not meaningful isolation: hostile model-generated code can still read accessible files, exfiltrate secrets, spawn child processes, consume resources, and pivot to the network or local environment.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The engine extracts code from the engineer model's response and executes it automatically during a debate workflow. In this skill context, the model is explicitly encouraged to generate executable Python, so prompt injection, malicious outputs, or simple model mistakes can directly become host-level code execution.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The skill executes model-authored Python locally, which gives untrusted model output code-execution capability on the host. Because the skill's purpose is multi-model debate/review, not trusted local automation, this is a dangerous privilege expansion and creates a direct path to host compromise or data theft if a prompt or retrieved context induces malicious code generation.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The implementation labels execution as 'safe' but simply writes arbitrary Python to a temp file and runs it with python3 plus a timeout. This can mislead operators into trusting an unsafe mechanism; there is no syscall restriction, import restriction, filesystem isolation, privilege drop, or network containment, so arbitrary code still runs with local user privileges.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill imports credentials from an external key-router path outside the skill boundary and falls back to a built-in API key. This creates hidden trust in external local code and couples the skill to host-resident secret infrastructure; if that path or imported module is tampered with, the skill may leak keys or route requests through malicious logic. The embedded fallback further increases the chance of unauthorized use of paid APIs.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The server enables unrestricted cross-origin access with allow_origins=['*'], allow_methods=['*'], and allow_headers=['*'] while exposing both HTTP and WebSocket interfaces on 0.0.0.0. This permits any website or network-reachable client to drive the debate engine, increasing the risk of unauthorized use, resource abuse, and indirect access to any capabilities exposed through the backend engine.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The trigger phrases are broad and generic, which increases the chance the skill will activate in contexts the user did not intend. Because this skill can invoke code execution, retrieval, persistence, and networked output, unintended invocation expands the attack surface and may cause sensitive discussions or code to be processed by a higher-risk workflow.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill describes impactful behaviors such as sandboxed code execution, vector retrieval, WebSocket output, and persistent storage without clearly warning users. This is dangerous because users may provide code, prompts, or sensitive business data without understanding that it could be executed, transmitted, broadcast in real time, or stored for later access.

Missing User Warnings

High

Confidence: 99% confidence
Finding: A hardcoded API key in source code is a credential exposure risk and can be reused by anyone with code or log access. It also increases blast radius because the application silently transmits requests to an external provider using embedded credentials, making abuse and unauthorized billing likely if the code leaks.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The skill writes model-generated Python to disk and executes it without an explicit warning, confirmation, or security boundary. This combines unsafe persistence with automatic execution, creating a direct path from untrusted model output to arbitrary code execution on the system.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The function sends user topics, prompts, and conversation context to an external LLM API, which can expose sensitive business or personal data if users assume local-only processing. This is primarily a privacy and compliance risk rather than a direct code-execution flaw, but it is important because debate history may include proprietary content.

Missing User Warnings

High

Confidence: 99% confidence
Finding: A hardcoded API key is embedded directly in source code. If the file is shared, logged, committed, or accessible to users, the credential can be stolen and abused for unauthorized API usage, billing fraud, and potentially access to associated account data.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill executes model-generated Python automatically without an explicit, informed user confirmation step. This is dangerous because users may invoke a debate engine expecting text generation, not local code execution with host privileges, leading to unexpected compromise, destructive actions, or data access.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The skill sends topic and prompt content, including prior debate history and retrieved knowledge, to an external LLM endpoint without any visible disclosure, consent, or redaction controls. This can expose sensitive user data, proprietary documents, or internal prompts to a third-party service, especially because knowledge retrieval reads local markdown content and injects it into requests.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal