虚拟论坛

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real virtual-debate skill, but it can run local shell commands that read local Skill files and send their contents to Claude with limited scoping or consent controls.

Install only if you are comfortable reviewing and running a local bash workflow that uses your Claude CLI credentials, consumes API quota, reads local Skill files, and sends their contents plus discussion history to Claude. Keep secrets out of Skill files, set SKILLS_DIR and OUTPUT_DIR deliberately, and treat the behavioral/game-theory strategy outputs as experimental persuasion aids rather than neutral or authoritative analysis.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (16)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 83% confidence
Finding: The skill documentation describes use of environment variables (`SKILLS_DIR` and `OUTPUT_DIR`) and external CLI-based execution, but no corresponding permissions are declared. That creates a capability/permission mismatch: the skill may access environment-derived paths or configuration in ways users and tooling cannot transparently review, increasing the risk of unintended file access or data flow.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The README explicitly promotes running an external shell script and invoking multiple external Claude CLI processes to implement the skill. That gives the skill execution capability far beyond a forum/discussion abstraction, increasing the attack surface to command execution, uncontrolled subprocess behavior, and abuse of local environment or credentials.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The documented workflow reads local Skill files from the user's home/workspace directories and injects their contents into prompts for external CLI processes. This is a local file access and exfiltration path not implied by the skill's stated purpose, and it could expose sensitive prompt files, proprietary content, or secrets embedded in local skill definitions.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The script allows SKILLS_DIR to be overridden by the environment and then reads arbitrary SKILL.md files from that path into prompts. In a hostile or shared environment, an attacker could redirect SKILLS_DIR to attacker-controlled content, causing prompt injection, unintended data exposure, or manipulation of model outputs.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: This module moves beyond neutral analysis and actively generates debate strategies based on prospect theory, bounded rationality, and nudges. In the context of a multi-agent forum skill, that creates audience-influence and persuasion capabilities that can be used to steer users or simulated participants through cognitive bias exploitation rather than transparent reasoning.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The code explicitly combines persuasion framing with nudge generation, including risk-based framing and position-targeted nudges. That is a manipulation-oriented capability because it operationalizes behavioral science into tactics for influencing outcomes instead of merely analyzing discourse.

Intent-Code Divergence

Low

Confidence: 84% confidence
Finding: The bias summary logic appears inconsistent with how biases are stored earlier in the file. Biases are pushed as strings, but the counting routine later accesses b.type, which can produce incorrect counts and weaken mitigation summaries, reducing the reliability of safety analysis and downstream recommendations.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: This code explicitly generates debate nudges tailored to audience traits such as risk aversion, social consciousness, and time pressure, including tactics like setting a position as the default, using social proof, and loss framing. In the context of a virtual forum skill, this moves beyond neutral discussion facilitation into targeted persuasive manipulation, which can bias participants or audiences and be abused for covert influence operations.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: This finding is a true issue: the module does not merely model prospect theory, it explicitly generates debate tactics intended to leverage loss aversion, certainty effects, status quo bias, and salience of small-probability harms in order to influence an opponent. In the context of a multi-agent virtual forum skill, this crosses from neutral analysis into persuasive manipulation, which can steer outputs toward exploitative rhetoric and increase the risk of unsafe or deceptive social-engineering-style behavior.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The README describes taking user input and local skill content, then forwarding both into external Claude CLI calls, but it does not clearly disclose the resulting third-party data transmission risk. Users may reasonably believe they are using a local discussion skill when in fact their prompts and locally stored skill text are being sent to an external service/API.

Vague Triggers

Medium

Confidence: 78% confidence
Finding: The trigger phrases include very broad everyday language such as “发起讨论”, “辩论”, and “主持讨论”, which can cause accidental invocation in unrelated conversations. In a skill that reads local skill files and sends content to an external API, overly broad activation materially increases the chance of unintended execution and unintended disclosure.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README explicitly documents piping user input into multiple external `claude --print` processes, which means the same prompt may be transmitted to several independent API-backed executions without any privacy notice, minimization guidance, or consent warning. In a multi-agent debate skill, users may provide sensitive prompts under the assumption they are interacting with one tool, so multiplying disclosure across 5 processes increases data exposure, logging surface, and billing side effects.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script sends skill contents, per-round prompts, and accumulated discussion history to the Claude CLI, which may forward data to an external service, but there is no clear user-facing disclosure or consent gate. If skill files or discussion history contain sensitive local data, this creates a confidentiality risk through unannounced exfiltration to a third party.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The skill explicitly states that it reads local `SKILL.md` content and sends it to an external API via `claude --print`. This is a real data exfiltration path: local skill files may contain proprietary prompts, internal instructions, paths, or secrets, and broad triggers make accidental transmission more likely in this skill’s context.

Ssd 4

Medium

Confidence: 94% confidence
Finding: This behavioral advice escalates from analyzing arguments to generating audience-profile-based nudges and influence strategies. In a discussion skill, tailoring outputs to audience susceptibilities increases the risk of covert persuasion, manipulation, and deceptive shaping of user beliefs.

Ssd 4

Medium

Confidence: 90% confidence
Finding: The recommendations include guidance such as exploiting loss framing and designing choice architecture around cognitive biases. Even if presented as debate strategy, combining bias detection with instructions to use those biases creates a concrete pathway for manipulative behavior rather than fair deliberation.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal