Security audit

多平台违禁词检测

Security checks across malware telemetry and agentic risk

Overview

The skill appears to do the advertised word checking, but it needs Review because it sends user content and an API key to a configurable remote service while also encouraging persistent credential storage.

Install only if you are comfortable sending checked copy, extracted file text, and fetched webpage text to RedFox or a configured HTTPS gateway. Use an ephemeral REDFOX_API_KEY when possible, avoid storing it in shell profiles, do not submit secrets or regulated documents, and treat URL checking as outbound network access that may retrieve private pages if you provide such links.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (16)

Tainted flow: 'api_url' from os.environ.get (line 85, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: try: # 发起 HTTPS POST 请求 response = requests.post(api_url, headers=headers, json=params, timeout=30) if response.status_code >= 400: raise Exception(f"HTTP请求失败: {response.status_code}, {response.text[:500]}")
Confidence: 94% confidence
Finding: response = requests.post(api_url, headers=headers, json=params, timeout=30)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill uses environment access, file reading, and network communication but does not declare these capabilities as permissions. This reduces transparency and informed consent, especially because user content and local configuration may be accessed and transmitted off-device. In a compliance-checking skill, hidden capability scope makes accidental overreach more dangerous because users may assume purely local analysis.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The skill’s description emphasizes prohibited-word checking but understates that submitted text, files, extracted webpage content, and OCR output are sent to a third-party backend or arbitrary override URL. It also involves local sensitive configuration handling for API credentials, which is materially different from a simple local checker and can expose confidential marketing drafts, unpublished content, or credentials to external systems.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The documentation directs the agent to persistently modify shell profiles or user environment variables to store API credentials. This exceeds the minimum necessary behavior for the skill and can leave secrets broadly available to other tools, future sessions, or unintended processes on the same account, increasing the blast radius of credential theft or misuse.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: The code states that detection is based on an official prohibited-word library, but the implementation actually sends user content to a third-party remote service and does not keep a local word list. This is a supply-chain and transparency issue: users may reasonably expect local or officially sourced checking, but their data is instead disclosed externally to a service whose data handling and corpus provenance are not verifiable from the code.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill scans shell startup files to recover REDFOX_API_KEY beyond the current process environment. Reading unrelated user configuration files expands access beyond what a simple text-checking tool needs and can expose secrets stored in those files, especially on shared or agent-managed systems where the skill should not inspect broader user configuration.

Context-Inappropriate Capability

Medium

Confidence: 80% confidence
Finding: The code can fetch arbitrary user-supplied URLs and extract large amounts of page content, which expands the skill from word checking into general-purpose remote content retrieval. In an agent environment, this can be abused to access internal or sensitive network resources via SSRF-like behavior, especially because there is no allowlist, private-IP filtering, or protocol/domain restriction.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The skill advertises activation through broad natural-language requests like 'Describe your needs in natural language,' which creates weak trigger boundaries. In agent environments, this can cause accidental invocation on unrelated content or make it easier for adversarial text to steer the skill into processing data the user did not clearly intend to submit.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: Saying users can switch platforms 'with a single phrase' or by merely mentioning the platform name makes the control surface ambiguous. That ambiguity can let incidental text, quoted content, or attacker-supplied material alter the moderation rule set unexpectedly, producing incorrect checks or bypassing intended review behavior.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The README states that uploaded files and images are processed automatically 'with no extra steps needed,' but does not define clear activation limits, file-type restrictions, or exclusions. In practice, this increases the risk of unintentional processing of sensitive content and broadens the attack surface for malicious or unexpected inputs.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill supports URL fetching, file uploads, and image text extraction, yet the README does not disclose privacy, retention, third-party processing, or data-handling risks. Users may submit confidential copy, internal documents, or sensitive URLs without understanding where the data goes or how it is stored, creating a real confidentiality and compliance risk.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The README instructs users to invoke the skill using unrestricted natural language, which makes activation boundaries ambiguous and increases the chance the skill is triggered by ordinary conversation or unrelated text. Because this skill can process uploads, images, and URLs, overly broad triggering could cause unintended handling of sensitive or untrusted content beyond what the user explicitly meant to submit to this skill.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The sample phrases are generic everyday requests like checking copy or a page, which are likely to overlap with normal user interactions and can unintentionally route unrelated content into this skill. In this context, that risk is amplified because the skill supports fetching webpage content and analyzing uploaded materials, potentially causing unintended access, processing, or disclosure of user-provided or external data.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The guidance tells users or the agent to persist API keys in shell profiles or user-level environment variables without a strong warning about credential persistence and exposure. This can cause secrets to remain on disk, be inherited by unrelated processes, appear in support logs or screenshots, and be reused outside the intended skill context.

Natural-Language Policy Violations

Medium

Confidence: 87% confidence
Finding: The script defaults to sending all checked content to a third-party domain in Hong Kong, without giving users a meaningful choice about region, provider, or data export implications. In this skill context, the content being checked may include unpublished marketing copy, regulated language, or confidential documents, so automatic cross-border transmission increases privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The function performs external HTTP requests to arbitrary URLs without any explicit disclosure, confirmation, or warning to the user that supplied content will trigger outbound network access. This creates privacy and trust risks and, in a hosted agent setting, can be chained with SSRF-style misuse to probe reachable endpoints while the user may not realize a network fetch is occurring.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Static analysis

No suspicious patterns detected.