CosyVoice3 macOS

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a coherent local text-to-speech and voice-cloning skill, but its manual installer pulls external software/models and the cloning feature should be used carefully.

Before installing, review the shell script because it downloads and runs external setup components and large model files. If you use voice cloning, only provide reference audio you are authorized to use and clearly label generated speech when sharing it.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Running the installer changes the local Python environment and trusts external packages, repositories, and model downloads.

Why it was flagged

The setup pulls a latest Miniconda installer, an external GitHub repository with submodules, Python packages, and model assets. This is expected for a local TTS model install, but it relies on external supply-chain sources.

Skill content

curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh ... git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git ... pip install transformers==4.51.3 modelscope onnxruntime soundfile librosa

Recommendation

Review the installer, verify the upstream project and model source, and prefer pinned/checksummed dependencies if you need stronger reproducibility.

Low

#ASI05: Unexpected Code Execution

What this means

If the downloaded installer or upstream source were compromised, it could affect the user’s local environment.

Why it was flagged

The installer executes a downloaded shell installer as part of setup. This is a disclosed, user-directed installation step, but it is still local code execution.

Skill content

bash Miniconda3-latest-MacOSX-arm64.sh -b -p $HOME/miniconda3

Recommendation

Only run the installer from a trusted network/source and consider verifying the installer checksum before execution.

Low

#ASI09: Human-Agent Trust Exploitation

What this means

Generated audio could sound like another person, creating impersonation or consent concerns if shared.

Why it was flagged

The skill explicitly supports generating speech in a cloned voice. This is disclosed and purpose-aligned, but cloned voices can mislead listeners if used without consent or clear labeling.

Skill content

**Zero-shot voice cloning**: Clone any voice from 3-10 seconds of audio

Recommendation

Use reference audio only with permission and label synthetic or cloned audio when appropriate.