Qwen Audio

Security checks across malware telemetry and agentic risk

Overview

The skill’s audio features match its stated purpose, but it can automatically install a prerelease Python package during use and it stores reusable voice samples locally.

Review this skill before installing if you do not want runtime package installation. Use an isolated Python environment, prefer pinned dependencies, and only provide voice samples you have permission to use and are comfortable storing locally.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#
ASI05: Unexpected Code Execution
Medium
What this means

On macOS, a TTS request could download and install package code automatically if the dependency is missing.

Why it was flagged

A normal runtime path can invoke a shell command that modifies the Python environment and installs a prerelease dependency, rather than limiting installation to an explicit setup step.

Skill content
except ImportError:
        print("mlx-audio 未安装,正在安装...", file=sys.stderr)
        os.system("uv add mlx-audio --prerelease=allow")
Recommendation

Move dependency installation to an explicit install step, ask the user before installing packages at runtime, and pin or lock dependency versions.

#
ASI04: Agentic Supply Chain Vulnerabilities
Low
What this means

Different installs may resolve to different package versions, and users rely on the upstream package sources being trustworthy.

Why it was flagged

These external ML dependencies are expected for the skill, but several are unpinned or version-ranged, which increases supply-chain variability.

Skill content
dependencies = [
    "mlx-audio>=0.3.1; platform_system == 'Darwin'",
    "qwen-asr; platform_system != 'Darwin'",
    "qwen-tts>=0.1.1; platform_system != 'Darwin'",
    "torch; platform_system != 'Darwin'"
]
Recommendation

Use a lockfile or pinned versions for reproducible installs, and install in an isolated environment.

#
ASI06: Memory and Context Poisoning
Low
What this means

Voice samples and transcripts may remain on disk after use and could be reused in later tasks.

Why it was flagged

The skill persists reusable voice profiles, including reference audio and transcripts, for later use.

Skill content
Voices are stored in the `./voices/` directory at the skill root level. Each voice has its own folder containing:
- `ref_audio.wav`
- `ref_text.txt`
- `ref_instruct.txt`
Recommendation

Only store voice samples you are comfortable keeping locally, delete unused voice profiles, and avoid using sensitive recordings unless needed.

#
ASI09: Human-Agent Trust Exploitation
Medium
What this means

Generated audio could be mistaken for a real person if used without consent or disclosure.

Why it was flagged

Voice cloning is clearly disclosed and purpose-aligned, but it can create convincing synthetic speech that may be misused for impersonation.

Skill content
Clone any voice using a reference audio sample. Provide the wav file and its transcript
Recommendation

Use voice cloning only with permission, keep source recordings private, and label generated audio as synthetic when sharing it.