CosyVoice3 macOS
PassAudited by ClawScan on May 1, 2026.
Overview
This appears to be a coherent local text-to-speech and voice-cloning skill, but its manual installer pulls external software/models and the cloning feature should be used carefully.
Before installing, review the shell script because it downloads and runs external setup components and large model files. If you use voice cloning, only provide reference audio you are authorized to use and clearly label generated speech when sharing it.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the installer changes the local Python environment and trusts external packages, repositories, and model downloads.
The setup pulls a latest Miniconda installer, an external GitHub repository with submodules, Python packages, and model assets. This is expected for a local TTS model install, but it relies on external supply-chain sources.
curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh ... git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git ... pip install transformers==4.51.3 modelscope onnxruntime soundfile librosa
Review the installer, verify the upstream project and model source, and prefer pinned/checksummed dependencies if you need stronger reproducibility.
If the downloaded installer or upstream source were compromised, it could affect the user’s local environment.
The installer executes a downloaded shell installer as part of setup. This is a disclosed, user-directed installation step, but it is still local code execution.
bash Miniconda3-latest-MacOSX-arm64.sh -b -p $HOME/miniconda3
Only run the installer from a trusted network/source and consider verifying the installer checksum before execution.
Generated audio could sound like another person, creating impersonation or consent concerns if shared.
The skill explicitly supports generating speech in a cloned voice. This is disclosed and purpose-aligned, but cloned voices can mislead listeners if used without consent or clear labeling.
**Zero-shot voice cloning**: Clone any voice from 3-10 seconds of audio
Use reference audio only with permission and label synthetic or cloned audio when appropriate.
