Security audit

qwen-audio-lab

Security checks across malware telemetry and agentic risk

Overview

This skill does what it claims: it creates speech and voice-cloning outputs using local macOS tools and Aliyun Qwen, with sensitive behavior mostly disclosed.

Install only if you are comfortable using Aliyun DashScope for cloud TTS and voice cloning. Use mac-say for local-only speech, avoid cloning voices without consent, limit the DashScope API key where possible, and review or delete remembered voices if you do not want reusable voice state kept locally.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (3)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: else: concat = per_slide_dir / f"slide-{slide_no:02d}-concat.txt" concat.write_text("".join([f"file '{f.as_posix()}'\n" for f in chunk_files]), encoding="utf-8") subprocess.run( ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat), "-c", "copy", str(final)], check=True, stdout=subprocess.DEVNULL,
Confidence: 88% confidence
Finding: subprocess.run( ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat), "-c", "copy", str(final)], check=True, stdout=subprocess.D

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The clone-voice flow base64-encodes a local reference recording and transmits it to Aliyun's remote customization endpoint, but the command itself provides no explicit runtime warning or consent checkpoint. Because voice samples are biometric and potentially highly sensitive, silent upload increases privacy and compliance risk if users invoke the skill without realizing data leaves the machine.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: This command sends user-supplied text to a third-party cloud TTS service without an explicit notice at execution time. In a narration skill, text may contain confidential documents, speaker notes, or proprietary content, so undisclosed remote transfer creates a real data-exposure risk even though it is functionally necessary for the feature.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal