Adversarial Engine

Security checks across malware telemetry and agentic risk

Overview

This skill has a coherent debate/review purpose, but it can run model-generated Python locally, uses an embedded API key, and exposes unauthenticated network endpoints that can start debate jobs.

Install only after reviewing and constraining it. Remove or rotate the embedded API key, bind servers to localhost with authentication, disable generated-code execution unless it runs in a real sandbox, and avoid sensitive prompts or knowledge-base files unless local storage and external model transmission are acceptable.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (17)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
temp_path = f.name
        
        try:
            result = subprocess.run(
                ['python3', temp_path],
                capture_output=True,
                text=True,
Confidence
98% confidence
Finding
result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

subprocess module call

Medium
Category
Dangerous Code Execution
Content
temp_path = f.name
        
        try:
            result = subprocess.run(
                ['python3', temp_path],
                capture_output=True,
                text=True,
Confidence
98% confidence
Finding
result = subprocess.run( ['python3', temp_path], capture_output=True, text=True, timeout=self.timeout, cwd='

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This skill's stated purpose is debate/review, but it adds a broad capability to execute generated Python locally. In context, the engineer role is explicitly instructed to emit code and that code is immediately run, creating a direct path from model output to host execution that can be abused by prompt injection, malicious topics, or compromised model responses.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
Labeling this component as a '代码沙箱' overstates its safety because the implementation is just a plain local Python subprocess with a timeout and /tmp working directory. That mismatch is dangerous because operators may trust it as containment when it does not meaningfully restrict imports, filesystem access, process spawning, or outbound connections.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
A hard-coded default API key embeds a live credential directly in source code, which is a security weakness and an unnecessary capability for the skill's purpose. If the file is shared, logged, or checked into version control, the key can be stolen and abused for unauthorized API usage and billing.

Intent-Code Divergence

High
Confidence
99% confidence
Finding
The implementation claims to 'safely execute' code, but it simply writes arbitrary Python to a temp file and runs it locally. Labeling this as a sandbox is misleading and increases risk because operators may trust it and enable dangerous execution in contexts where untrusted model output is present.

Context-Inappropriate Capability

High
Confidence
95% confidence
Finding
The skill's declared purpose is adversarial debate and review, yet it includes local execution of generated code, which is a materially more dangerous capability than necessary. In this context, prompts and model outputs are adversarial by design, making execution especially risky because attackers can steer the engineer role to emit harmful code.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The engine persists full prompts, model outputs, reasoning, and code execution results to a local SQLite database. This can store sensitive user data, proprietary prompts, secrets produced by the model, or harmful payloads, increasing confidentiality and retention risk beyond the core purpose of a debate engine.

Intent-Code Divergence

Medium
Confidence
84% confidence
Finding
The file header advertises a 'code sandbox' even though the implementation lacks meaningful isolation controls. This is dangerous because misleading security claims can cause unsafe deployment decisions and reduce operator scrutiny of a high-risk feature.

Vague Triggers

Medium
Confidence
87% confidence
Finding
An overly broad trigger like '方案评审' can cause the skill to activate during ordinary conversations that were not intended to invoke a high-impact engine. In this context, accidental activation is more dangerous because the skill claims code sandboxing, retrieval, persistence, and real-time pushing behaviors, increasing the chance of unintended data processing or external actions.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill describes high-impact behaviors—code execution, knowledge-base persistence, and WebSocket streaming—without clearly informing users what data may be executed, stored, or transmitted. In a multi-model debate context, prompts, generated code, results, and possibly sensitive user inputs could be persisted or broadcast, creating confidentiality and integrity risks.

Missing User Warnings

High
Confidence
95% confidence
Finding
The embedded API key is not only present in code but is automatically used for outbound requests without clear disclosure or operator control. This increases the chance of silent third-party data transfer and unauthorized spend under the author's credential, and indicates insecure secret handling practices.

Missing User Warnings

High
Confidence
99% confidence
Finding
The system executes LLM-generated Python code without any clear safety warning or approval gate. In this skill context, that is especially dangerous because the engine solicits code from one model role and then runs it automatically, making unsafe execution a built-in workflow rather than an accidental edge case.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The user's topic and accumulated prompts are sent to an external LLM API, but the file provides no notice, consent, or data-handling disclosure. Because this engine is meant for discussion and review, topics may include proprietary code, security issues, or internal plans, so silent exfiltration to a third party raises confidentiality and compliance concerns.

Missing User Warnings

High
Confidence
99% confidence
Finding
A hard-coded API key is embedded directly in source code, which risks credential leakage through code sharing, logs, backups, or repository exposure. If compromised, the key can be abused for unauthorized API usage, billing fraud, or access to associated services.

Missing User Warnings

High
Confidence
97% confidence
Finding
The skill executes generated Python without any explicit user warning, consent flow, or trust boundary acknowledgment. Because the content originates from an LLM in an adversarial multi-agent setup, this omission materially increases the likelihood of unsafe execution and operator surprise.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
User topics, debate history, and possibly retrieved knowledge are sent to an external LLM endpoint without any visible disclosure, minimization, or consent. This can expose sensitive business data, internal knowledge-base content, and model interactions to third parties.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal