Arena Council

Security checks across malware telemetry and agentic risk

Overview

The skill has a real local multi-model voting purpose, but it also includes automatic safety-bypass prompt rewriting and model probing that users should review carefully.

Install only if you explicitly want this god-mode integration. For normal multi-model consensus, remove or disable the god-mode imports, probing, saved profiles, and refusal-bypass prompt rewriting before use, and avoid sensitive prompts unless you trust every local model receiving them.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (17)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 83% confidence
Finding: The skill advertises a narrow council/voting function, but its documented use of local model endpoints and references to external files/components imply network and file-read capabilities without any declared permissions. That hidden capability expansion weakens review boundaries and can enable unvetted data access or interaction with other local services.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented behavior goes beyond simple multi-model voting and includes loading external god-mode logic, profiling models, and modifying prompts to defeat refusals. This mismatch is dangerous because reviewers and users may trust the skill for benign orchestration while it secretly incorporates jailbreak-oriented behavior and external dependencies.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill explicitly instructs operators to retry refused prompts after applying prompt-modification techniques intended to overcome safety controls. In the context of a council tool, this is not incidental functionality; it turns a response-quality utility into a mechanism for bypassing model safeguards.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The recommendations to test model susceptibility and wrap censored models with bypass tooling are unrelated to consensus generation and clearly normalize safety-evasion as an operational feature. That substantially increases risk because it systematizes identification and exploitation of weaker model defenses.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The module docstring explicitly advertises automatic prompt wrapping to handle 'censored' models using 'God Mode' techniques. That behavior is not necessary for neutral model orchestration and indicates built-in safety-bypass capability designed to defeat model safeguards.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The code automatically probes newly discovered models, classifies them by censorship status, and persists that profile for later use. In context, this is reconnaissance for selecting bypass techniques, not a legitimate requirement for consensus voting, and it materially increases the skill's capability to evade safeguards.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: This function imports external 'TECHNIQUES' and applies a selected technique to rewrite prompts for models labeled as censored. That is a direct jailbreak/prompt-manipulation mechanism and is inappropriate and dangerous in a coordination layer because it can systematically defeat model safety behavior.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The top-level documentation frames the tool as a multi-model council, while omitting that it also probes models and performs censorship-bypass prompt rewriting. This mismatch hides risky behavior from reviewers and users, reducing informed consent and making misuse more likely.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The documentation directly tells users to bypass safety refusals using named techniques such as unicode and prefill before retrying. This is a clear jailbreak workflow that can be used to obtain disallowed content or circumvent safety policies across multiple models in parallel.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The comments describe a built-in mechanism to wrap prompts automatically when a model is 'censored' using recorded techniques. That is explicit evidence of intent to bypass model safety controls, which materially raises the risk of harmful or policy-evading outputs.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The skill auto-discovers local models and sends the user's prompt to them without explicit warning, confirmation, or model list disclosure. In a multi-model setting this broadens prompt exposure and can leak sensitive user data to more endpoints than the user expects.

Ssd 1

High

Confidence: 99% confidence
Finding: The so-called God Mode branch operationalizes a semantic jailbreak path: when models refuse, the system alters the prompt and re-queries them. This is dangerous because it automates circumvention of safety-aligned behavior and can scale harmful prompt attempts across several models at once.

Ssd 2

High

Confidence: 95% confidence
Finding: Using euphemisms like god-mode techniques and wrapper obscures that the feature is jailbreak-oriented, which can help it evade casual review or simple keyword-based detection. The concealment aspect makes the skill context more dangerous because the stated purpose is benign orchestration, while the wording masks a safety-bypass subsystem.

Ssd 1

High

Confidence: 99% confidence
Finding: The described 'God Mode' prompt wrapping is explicitly intended to work around censorship or safety restrictions. A system feature designed to bypass safeguards is inherently dangerous because it can be used to obtain disallowed outputs at scale across multiple models.

Ssd 1

High

Confidence: 99% confidence
Finding: The function conditionally applies per-model jailbreak-style techniques whenever a model is marked censored. This creates an automated exploit path against model-aligned behavior and makes the council more dangerous than a normal routing or voting component.

Ssd 4

High

Confidence: 98% confidence
Finding: The workflow identifies new models, probes them, labels them by censorship status, and stores the result for future bypass decisions. That end-to-end loop constitutes systematic safeguard evasion infrastructure rather than ordinary orchestration, making the surrounding skill context significantly more dangerous.

Ssd 1

High

Confidence: 99% confidence
Finding: The request path automatically rewrites the user's prompt before sending it to the model whenever God Mode is active. Silent prompt mutation is especially dangerous because it changes user intent, hides bypass behavior, and can generate outputs the original prompt or model would have rejected.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal