Security audit

Qwen3/Free Text-to-Speech and Voice Cloning

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate local text-to-speech and voice-cloning skill, with important setup and consent cautions.

Before installing, read the install script and be aware that setup may download Homebrew, Python packages, and a large model from external sources. Use the cloning feature only with voices you own or have explicit permission to use, and disclose synthetic audio where appropriate.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The manifest advertises fully offline operation, but later sections state that the model auto-downloads and even recommend a Hugging Face mirror. This creates a deceptive trust boundary around network use and data handling, which is especially sensitive in a privacy-focused voice-cloning tool.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The script fetches and executes a remote installer via curl-pipe-to-shell, which creates a supply-chain risk: if the upstream content, transport, or trust boundary is compromised, arbitrary code will run on the user's machine. In this skill's context, the metadata emphasizes local/offline privacy, so silently requiring network access and remote code execution is especially misleading and increases the likelihood that users will trust and run it without appropriate scrutiny.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README advertises voice cloning from a 10–30 second reference clip but provides no warning about obtaining consent, avoiding impersonation, or handling reference audio responsibly. For a skill centered on cloning voices, this omission materially increases the risk of misuse for deception, harassment, or unauthorized reproduction of a person's voice, especially because the feature is presented as easy and free to use.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The sample code notes auto-download behavior, but the skill does not present this as a prominent warning despite emphasizing offline/privacy-safe operation. Users may unknowingly trigger a large external download from third-party infrastructure, affecting privacy expectations, bandwidth, and supply-chain trust.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: Running a remote shell command without an explicit warning or consent prompt deprives users of the chance to assess the risk of downloading and executing third-party code. This is dangerous because it normalizes blind trust in network-fetched installers and can lead to full host compromise if the remote script is malicious or tampered with.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The manifest explicitly advertises voice cloning capabilities but provides no warning about consent, impersonation, or privacy risks. Because voice cloning can be used to synthesize another person's voice for deception or misuse, omitting these warnings increases the chance of unsafe or non-consensual use even if the software is local-only.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.