Adversarial Engine

Security checks across malware telemetry and agentic risk

Overview

This skill has a coherent debate/review purpose, but it can run model-generated Python locally, uses an embedded API key, and exposes unauthenticated network endpoints that can start debate jobs.

Install only after reviewing and constraining it. Remove or rotate the embedded API key, bind servers to localhost with authentication, disable generated-code execution unless it runs in a real sandbox, and avoid sensitive prompts or knowledge-base files unless local storage and external model transmission are acceptable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (17)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: temp_path = f.name try: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True,
Confidence: 98% confidence
Finding: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

subprocess module call

Medium

Category: Dangerous Code Execution
Content: temp_path = f.name try: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True,
Confidence: 98% confidence
Finding: result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: This skill's stated purpose is debate/review, but it adds a broad capability to execute generated Python locally. In context, the engineer role is explicitly instructed to emit code and that code is immediately run, creating a direct path from model output to host execution that can be abused by prompt injection, malicious topics, or compromised model responses.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: Labeling this component as a '代码沙箱' overstates its safety because the implementation is just a plain local Python subprocess with a timeout and /tmp working directory. That mismatch is dangerous because operators may trust it as containment when it does not meaningfully restrict imports, filesystem access, process spawning, or outbound connections.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: A hard-coded default API key embeds a live credential directly in source code, which is a security weakness and an unnecessary capability for the skill's purpose. If the file is shared, logged, or checked into version control, the key can be stolen and abused for unauthorized API usage and billing.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The implementation claims to 'safely execute' code, but it simply writes arbitrary Python to a temp file and runs it locally. Labeling this as a sandbox is misleading and increases risk because operators may trust it and enable dangerous execution in contexts where untrusted model output is present.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The skill's declared purpose is adversarial debate and review, yet it includes local execution of generated code, which is a materially more dangerous capability than necessary. In this context, prompts and model outputs are adversarial by design, making execution especially risky because attackers can steer the engineer role to emit harmful code.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The engine persists full prompts, model outputs, reasoning, and code execution results to a local SQLite database. This can store sensitive user data, proprietary prompts, secrets produced by the model, or harmful payloads, increasing confidentiality and retention risk beyond the core purpose of a debate engine.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The file header advertises a 'code sandbox' even though the implementation lacks meaningful isolation controls. This is dangerous because misleading security claims can cause unsafe deployment decisions and reduce operator scrutiny of a high-risk feature.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: An overly broad trigger like '方案评审' can cause the skill to activate during ordinary conversations that were not intended to invoke a high-impact engine. In this context, accidental activation is more dangerous because the skill claims code sandboxing, retrieval, persistence, and real-time pushing behaviors, increasing the chance of unintended data processing or external actions.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill describes high-impact behaviors—code execution, knowledge-base persistence, and WebSocket streaming—without clearly informing users what data may be executed, stored, or transmitted. In a multi-model debate context, prompts, generated code, results, and possibly sensitive user inputs could be persisted or broadcast, creating confidentiality and integrity risks.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The embedded API key is not only present in code but is automatically used for outbound requests without clear disclosure or operator control. This increases the chance of silent third-party data transfer and unauthorized spend under the author's credential, and indicates insecure secret handling practices.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The system executes LLM-generated Python code without any clear safety warning or approval gate. In this skill context, that is especially dangerous because the engine solicits code from one model role and then runs it automatically, making unsafe execution a built-in workflow rather than an accidental edge case.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The user's topic and accumulated prompts are sent to an external LLM API, but the file provides no notice, consent, or data-handling disclosure. Because this engine is meant for discussion and review, topics may include proprietary code, security issues, or internal plans, so silent exfiltration to a third party raises confidentiality and compliance concerns.

Missing User Warnings

High

Confidence: 99% confidence
Finding: A hard-coded API key is embedded directly in source code, which risks credential leakage through code sharing, logs, backups, or repository exposure. If compromised, the key can be abused for unauthorized API usage, billing fraud, or access to associated services.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill executes generated Python without any explicit user warning, consent flow, or trust boundary acknowledgment. Because the content originates from an LLM in an adversarial multi-agent setup, this omission materially increases the likelihood of unsafe execution and operator surprise.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: User topics, debate history, and possibly retrieved knowledge are sent to an external LLM endpoint without any visible disclosure, minimization, or consent. This can expose sensitive business data, internal knowledge-base content, and model interactions to third parties.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal