Openclaw Mlx Audio

Security checks across malware telemetry and agentic risk

Overview

The local audio features are mostly coherent, but the package includes unverified installer execution and unrelated autonomous-code workflow documentation that users should review carefully.

Install only if you are comfortable with host-level dependency changes and local audio/model processing. Review install.sh before running it, avoid curl-to-shell where possible, use only voice samples you own or have permission to clone, and do not feed private text or audio unless you are comfortable with it appearing in local logs.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Findings (17)

Tainted flow: 'cmd' from os.getenv (line 113, credential/environment) → subprocess.run (code execution)

Medium
Category
Data Flow
Content
cmd.extend(["--language", language])

            logger.info(f"Running: {' '.join(cmd)}")
            subprocess.run(cmd, check=True, capture_output=True)

            # Read result
            txt_path = Path(f"{output_base}.txt")
Confidence
84% confidence
Finding
subprocess.run(cmd, check=True, capture_output=True)

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
This file documents installation and operation of an unrelated autonomous code-improvement skill inside a local TTS/STT skill package. That expands the apparent capability and trust boundary of the skill, creating a supply-chain and scope-confusion risk where users may invoke self-modifying or repo-affecting workflows they did not expect from an audio integration skill.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The documented workflows include autonomous code edits, iterative verification, rollback, logging, and release preparation, all of which materially exceed the expected behavior of a local TTS/STT integration. In this context, such guidance can normalize dangerous agent behavior that modifies source code or repository state, increasing the chance of unintended destructive changes or abuse if a user follows the documented commands.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The installer performs network retrieval and system-level package installation, which expands its capability beyond a narrowly scoped local TTS/STT plugin and increases supply-chain and host-modification risk. While dependency installation is common for setup scripts, this script installs software from external sources and modifies the system without strong provenance checks or confinement.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
Piping a remotely fetched script directly into the shell executes unreviewed code from the network immediately, creating a classic supply-chain and remote code execution risk. If the upstream host, connection, or script content is compromised, arbitrary code will run in the user's environment during installation.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The voice-cloning workflow explicitly instructs users to submit reference audio and extract voice features, but it provides no consent, authorization, or privacy guidance. This is risky because voice data is highly sensitive biometric information, and cloning another person's voice without clear permission can enable impersonation, fraud, and privacy violations.

Missing User Warnings

Low
Confidence
84% confidence
Finding
The STT test flow tells users to send voice messages for transcription without warning that audio may contain sensitive personal or confidential information. Even in a local tool context, users should be informed that spoken content is being processed and may be logged, retained, or exposed in Discord test channels.

Missing User Warnings

Medium
Confidence
86% confidence
Finding
The README instructs users to run installation and system-modifying commands, including piping a remote script directly into a shell, without warning about trust and system-change implications. In a skill ecosystem, users may copy-paste these commands blindly, increasing the risk of supply-chain compromise or unintended host modification.

Vague Triggers

Medium
Confidence
77% confidence
Finding
Broad triggers like generic 'TTS ...', 'STT ...', and natural-language Chinese phrases increase the chance of accidental invocation during normal conversation. In a skill that may perform file operations, shell-backed processing, or model execution, unintended triggering can cause unwanted processing of user content or execution of expensive or sensitive actions.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The test plan instructs users to run a destructive package-management command (`brew uninstall ffmpeg`) to simulate a missing dependency, but it does so without an explicit warning, rollback guidance, or safer alternative. In a markdown plan for an installable skill, operators may copy commands directly, causing unnecessary system changes and breaking unrelated workflows that depend on ffmpeg.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The markdown instructs users to run workflows that can change files, run verification commands, log iterations, and prepare releases, but it does not prominently warn that repository state, commits, or local files may be modified. For an agent skill, omission of these side effects is dangerous because users may treat the commands as harmless assistance rather than state-changing automation.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The script fetches and executes external code without an explicit warning or consent checkpoint, which can mislead users into running dangerous setup steps they have not reviewed. This increases the likelihood of unsafe execution and masks the true trust boundary involved in installation.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The server logs the full command line, which includes user-supplied text from the input field. That can expose sensitive prompts, personal data, or secrets in logs and monitoring systems, and TTS/STT workloads often process exactly the kind of private content users do not expect to be retained in plaintext.

External Script Fetching

Low
Category
Supply Chain
Content
brew install ffmpeg

# 2. 安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 3. 安装 mlx-audio
uv tool install --force mlx-audio --prerelease=allow
Confidence
97% confidence
Finding
curl -LsSf https://astral.sh/uv/install.sh | sh

Tool Parameter Abuse

High
Category
Tool Misuse
Content
Run: ./install.sh

# 清除缓存重试
rm -rf ~/.cache/huggingface/hub/models--mlx-community--*

# 检查配置
openclaw doctor
Confidence
92% confidence
Finding
rm -rf ~

Tool Parameter Abuse

High
Category
Tool Misuse
Content
Run: ./install.sh

# 清除缓存重试
rm -rf ~/.cache/huggingface/hub/models--mlx-community--*

# 检查配置
openclaw doctor
Confidence
92% confidence
Finding
rm -rf ~/.cache/huggingface/hub/

Chaining Abuse

High
Category
Tool Misuse
Content
brew install ffmpeg

# 2. 安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 3. 安装 mlx-audio
uv tool install --force mlx-audio --prerelease=allow
Confidence
97% confidence
Finding
| sh

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal