Openclaw Mlx Audio

Security checks across malware telemetry and agentic risk

Overview

The local audio features are mostly coherent, but the package includes unverified installer execution and unrelated autonomous-code workflow documentation that users should review carefully.

Install only if you are comfortable with host-level dependency changes and local audio/model processing. Review install.sh before running it, avoid curl-to-shell where possible, use only voice samples you own or have permission to clone, and do not feed private text or audio unless you are comfortable with it appearing in local logs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger

Findings (17)

Tainted flow: 'cmd' from os.getenv (line 113, credential/environment) → subprocess.run (code execution)

Medium

Category: Data Flow
Content: cmd.extend(["--language", language]) logger.info(f"Running: {' '.join(cmd)}") subprocess.run(cmd, check=True, capture_output=True) # Read result txt_path = Path(f"{output_base}.txt")
Confidence: 84% confidence
Finding: subprocess.run(cmd, check=True, capture_output=True)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: This file documents installation and operation of an unrelated autonomous code-improvement skill inside a local TTS/STT skill package. That expands the apparent capability and trust boundary of the skill, creating a supply-chain and scope-confusion risk where users may invoke self-modifying or repo-affecting workflows they did not expect from an audio integration skill.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The documented workflows include autonomous code edits, iterative verification, rollback, logging, and release preparation, all of which materially exceed the expected behavior of a local TTS/STT integration. In this context, such guidance can normalize dangerous agent behavior that modifies source code or repository state, increasing the chance of unintended destructive changes or abuse if a user follows the documented commands.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The installer performs network retrieval and system-level package installation, which expands its capability beyond a narrowly scoped local TTS/STT plugin and increases supply-chain and host-modification risk. While dependency installation is common for setup scripts, this script installs software from external sources and modifies the system without strong provenance checks or confinement.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: Piping a remotely fetched script directly into the shell executes unreviewed code from the network immediately, creating a classic supply-chain and remote code execution risk. If the upstream host, connection, or script content is compromised, arbitrary code will run in the user's environment during installation.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The voice-cloning workflow explicitly instructs users to submit reference audio and extract voice features, but it provides no consent, authorization, or privacy guidance. This is risky because voice data is highly sensitive biometric information, and cloning another person's voice without clear permission can enable impersonation, fraud, and privacy violations.

Missing User Warnings

Low

Confidence: 84% confidence
Finding: The STT test flow tells users to send voice messages for transcription without warning that audio may contain sensitive personal or confidential information. Even in a local tool context, users should be informed that spoken content is being processed and may be logged, retained, or exposed in Discord test channels.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The README instructs users to run installation and system-modifying commands, including piping a remote script directly into a shell, without warning about trust and system-change implications. In a skill ecosystem, users may copy-paste these commands blindly, increasing the risk of supply-chain compromise or unintended host modification.

Vague Triggers

Medium

Confidence: 77% confidence
Finding: Broad triggers like generic 'TTS ...', 'STT ...', and natural-language Chinese phrases increase the chance of accidental invocation during normal conversation. In a skill that may perform file operations, shell-backed processing, or model execution, unintended triggering can cause unwanted processing of user content or execution of expensive or sensitive actions.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The test plan instructs users to run a destructive package-management command (`brew uninstall ffmpeg`) to simulate a missing dependency, but it does so without an explicit warning, rollback guidance, or safer alternative. In a markdown plan for an installable skill, operators may copy commands directly, causing unnecessary system changes and breaking unrelated workflows that depend on ffmpeg.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The markdown instructs users to run workflows that can change files, run verification commands, log iterations, and prepare releases, but it does not prominently warn that repository state, commits, or local files may be modified. For an agent skill, omission of these side effects is dangerous because users may treat the commands as harmless assistance rather than state-changing automation.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The script fetches and executes external code without an explicit warning or consent checkpoint, which can mislead users into running dangerous setup steps they have not reviewed. This increases the likelihood of unsafe execution and masks the true trust boundary involved in installation.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The server logs the full command line, which includes user-supplied text from the input field. That can expose sensitive prompts, personal data, or secrets in logs and monitoring systems, and TTS/STT workloads often process exactly the kind of private content users do not expect to be retained in plaintext.

External Script Fetching

Low

Category: Supply Chain
Content: brew install ffmpeg # 2. 安装 uv curl -LsSf https://astral.sh/uv/install.sh | sh # 3. 安装 mlx-audio uv tool install --force mlx-audio --prerelease=allow
Confidence: 97% confidence
Finding: curl -LsSf https://astral.sh/uv/install.sh | sh

Tool Parameter Abuse

High

Category: Tool Misuse
Content: Run: ./install.sh # 清除缓存重试 rm -rf ~/.cache/huggingface/hub/models--mlx-community--* # 检查配置 openclaw doctor
Confidence: 92% confidence
Finding: rm -rf ~

Tool Parameter Abuse

High

Category: Tool Misuse
Content: Run: ./install.sh # 清除缓存重试 rm -rf ~/.cache/huggingface/hub/models--mlx-community--* # 检查配置 openclaw doctor
Confidence: 92% confidence
Finding: rm -rf ~/.cache/huggingface/hub/

Chaining Abuse

High

Category: Tool Misuse
Content: brew install ffmpeg # 2. 安装 uv curl -LsSf https://astral.sh/uv/install.sh | sh # 3. 安装 mlx-audio uv tool install --force mlx-audio --prerelease=allow
Confidence: 97% confidence
Finding: | sh

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal