Back to skill

Security audit

公众号违禁词检测

Security checks across malware telemetry and agentic risk

Overview

This skill is mostly aligned with prohibited-word checking, but it needs Review because it sends user content to RedFox, scans shell startup files for an API key, and creates local output files despite inconsistent privacy claims.

Install only if you are comfortable sending checked copy, extracted file text, and fetched webpage text to redfox.hk for analysis. Prefer setting REDFOX_API_KEY through a dedicated environment/secret mechanism, avoid putting it in shell startup files, and do not submit confidential or regulated material unless your organization approves that data flow. Expect optimized copy to be written to a local text file after detections.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (13)

Intent-Code Divergence

Medium
Confidence
84% confidence
Finding
The document claims user content is not stored locally, yet elsewhere says it generates or writes output text files. Contradictory data-handling claims can cause users to submit sensitive content under false assumptions, increasing the risk of unintended local persistence and exposure through logs, temp files, or saved outputs.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The workflow explicitly tells the agent to read shell startup files like ~/.bashrc and ~/.zshrc to recover API credentials if the environment variable is unset. That expands access from a single needed secret to broad reading of user-local configuration files, which may contain unrelated tokens, private commands, or other sensitive data; for a prohibited-word scanning skill, this is unnecessary and creates avoidable credential-exposure and over-collection risk.

Description-Behavior Mismatch

Medium
Confidence
83% confidence
Finding
The workflow mandates writing the optimized content to a local file and sending that file to the user, even though the skill's primary function is scanning and suggesting safer wording. This increases data handling surface by persisting potentially sensitive user text on disk and creating an extra exfiltration/output channel that may retain content longer than necessary.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The document states that user content is sent via HTTPS POST to an external API and also supports webpage fetching, but the skill description does not clearly disclose these outbound data flows. Users may reasonably expect local scanning from the manifest, so undisclosed transmission of pasted text, file contents, or fetched page text to third parties creates privacy and consent risk.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The script searches the user's shell startup files for an API key, which expands credential access beyond what is necessary for normal operation. A content-checking tool should not inspect broad personal config files because this can expose unrelated secrets and surprises the user with hidden credential harvesting behavior.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The README tells users to 'describe what you need in plain language' with no constrained command format or explicit trust boundaries. In a skill that can ingest URLs, files, and images and send content to an external detection service, overly broad invocation guidance increases the chance of prompt confusion, unintended tool use, or processing attacker-supplied content beyond the user's intended scope.

Vague Triggers

Medium
Confidence
89% confidence
Finding
The README allows broad natural-language invocation without clear activation boundaries, which can cause the skill to trigger on loosely related user requests and process content the user did not intend to send for prohibited-word scanning. In a skill that accepts pasted text, files, images, and URLs, overly broad invocation increases the chance of unintended data handling and misuse.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README mentions encrypted transmission and no local storage, but it does not prominently and explicitly tell users that uploaded documents, image-extracted text, and fetched webpage content are sent to a third-party detection service. Users may reasonably assume analysis happens locally, leading to accidental disclosure of confidential marketing drafts or internal documents.

Vague Triggers

Medium
Confidence
76% confidence
Finding
An overly broad trigger phrase can cause accidental invocation during ordinary conversation, which is risky here because the skill may read files or send user-provided content to a remote API. In this context, unintended activation increases the chance of unreviewed data disclosure and confusing agent behavior.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill explicitly sends user content to a remote API but does not clearly warn users not to submit confidential, regulated, or sensitive material. Because the skill is marketed for scanning documents, files, webpages, and images, users may reasonably provide proprietary or personal content, leading to privacy, compliance, or contractual exposure if transmitted off-platform.

Vague Triggers

Medium
Confidence
72% confidence
Finding
The trigger conditions are broad and somewhat ambiguous, covering general phrases like compliance, advertising-law prohibited words, or any uploaded link/file mentioning public-account-related checks. Overbroad activation can cause the skill to engage unexpectedly, leading to unsolicited external processing of user content and increasing privacy and scope-creep risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The tool sends user-provided content, file contents, or extracted webpage text to a third-party remote API for analysis without an explicit warning or consent at execution time. Because the input may contain drafts, private documents, or internal data, silent transmission creates a confidentiality and compliance risk.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The script reads shell configuration files for credentials without clearly informing the user at the point of execution. Even if the intent is convenience, silently inspecting personal startup files is unexpected and can expose sensitive information or normalize unsafe credential handling.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.