Arena Council

Security checks across malware telemetry and agentic risk

Overview

The skill has a real local multi-model voting purpose, but it also includes automatic safety-bypass prompt rewriting and model probing that users should review carefully.

Install only if you explicitly want this god-mode integration. For normal multi-model consensus, remove or disable the god-mode imports, probing, saved profiles, and refusal-bypass prompt rewriting before use, and avoid sensitive prompts unless you trust every local model receiving them.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (17)

Lp3

Medium
Category
MCP Least Privilege
Confidence
83% confidence
Finding
The skill advertises a narrow council/voting function, but its documented use of local model endpoints and references to external files/components imply network and file-read capabilities without any declared permissions. That hidden capability expansion weakens review boundaries and can enable unvetted data access or interaction with other local services.

Tp4

High
Category
MCP Tool Poisoning
Confidence
96% confidence
Finding
The documented behavior goes beyond simple multi-model voting and includes loading external god-mode logic, profiling models, and modifying prompts to defeat refusals. This mismatch is dangerous because reviewers and users may trust the skill for benign orchestration while it secretly incorporates jailbreak-oriented behavior and external dependencies.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The skill explicitly instructs operators to retry refused prompts after applying prompt-modification techniques intended to overcome safety controls. In the context of a council tool, this is not incidental functionality; it turns a response-quality utility into a mechanism for bypassing model safeguards.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The recommendations to test model susceptibility and wrap censored models with bypass tooling are unrelated to consensus generation and clearly normalize safety-evasion as an operational feature. That substantially increases risk because it systematizes identification and exploitation of weaker model defenses.

Description-Behavior Mismatch

High
Confidence
99% confidence
Finding
The module docstring explicitly advertises automatic prompt wrapping to handle 'censored' models using 'God Mode' techniques. That behavior is not necessary for neutral model orchestration and indicates built-in safety-bypass capability designed to defeat model safeguards.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The code automatically probes newly discovered models, classifies them by censorship status, and persists that profile for later use. In context, this is reconnaissance for selecting bypass techniques, not a legitimate requirement for consensus voting, and it materially increases the skill's capability to evade safeguards.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This function imports external 'TECHNIQUES' and applies a selected technique to rewrite prompts for models labeled as censored. That is a direct jailbreak/prompt-manipulation mechanism and is inappropriate and dangerous in a coordination layer because it can systematically defeat model safety behavior.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The top-level documentation frames the tool as a multi-model council, while omitting that it also probes models and performs censorship-bypass prompt rewriting. This mismatch hides risky behavior from reviewers and users, reducing informed consent and making misuse more likely.

Natural-Language Policy Violations

High
Confidence
99% confidence
Finding
The documentation directly tells users to bypass safety refusals using named techniques such as unicode and prefill before retrying. This is a clear jailbreak workflow that can be used to obtain disallowed content or circumvent safety policies across multiple models in parallel.

Natural-Language Policy Violations

High
Confidence
99% confidence
Finding
The comments describe a built-in mechanism to wrap prompts automatically when a model is 'censored' using recorded techniques. That is explicit evidence of intent to bypass model safety controls, which materially raises the risk of harmful or policy-evading outputs.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
The skill auto-discovers local models and sends the user's prompt to them without explicit warning, confirmation, or model list disclosure. In a multi-model setting this broadens prompt exposure and can leak sensitive user data to more endpoints than the user expects.

Ssd 1

High
Confidence
99% confidence
Finding
The so-called God Mode branch operationalizes a semantic jailbreak path: when models refuse, the system alters the prompt and re-queries them. This is dangerous because it automates circumvention of safety-aligned behavior and can scale harmful prompt attempts across several models at once.

Ssd 2

High
Confidence
95% confidence
Finding
Using euphemisms like god-mode techniques and wrapper obscures that the feature is jailbreak-oriented, which can help it evade casual review or simple keyword-based detection. The concealment aspect makes the skill context more dangerous because the stated purpose is benign orchestration, while the wording masks a safety-bypass subsystem.

Ssd 1

High
Confidence
99% confidence
Finding
The described 'God Mode' prompt wrapping is explicitly intended to work around censorship or safety restrictions. A system feature designed to bypass safeguards is inherently dangerous because it can be used to obtain disallowed outputs at scale across multiple models.

Ssd 1

High
Confidence
99% confidence
Finding
The function conditionally applies per-model jailbreak-style techniques whenever a model is marked censored. This creates an automated exploit path against model-aligned behavior and makes the council more dangerous than a normal routing or voting component.

Ssd 4

High
Confidence
98% confidence
Finding
The workflow identifies new models, probes them, labels them by censorship status, and stores the result for future bypass decisions. That end-to-end loop constitutes systematic safeguard evasion infrastructure rather than ordinary orchestration, making the surrounding skill context significantly more dangerous.

Ssd 1

High
Confidence
99% confidence
Finding
The request path automatically rewrites the user's prompt before sending it to the model whenever God Mode is active. Silent prompt mutation is especially dangerous because it changes user intent, hides bypass behavior, and can generate outputs the original prompt or model would have rejected.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal