Clonev

WarnAudited by ClawScan on May 10, 2026.

Overview

This skill performs the advertised voice cloning, but it enables realistic impersonation, encourages sending cloned audio externally, and keeps voice samples in a hard-coded local folder.

Install only if you trust the source and intend to use it with consent. Review or pin the Docker image, make sure Docker and ffmpeg are expected on your system, require confirmation before sending cloned audio anywhere, and periodically delete retained voice samples from the configured voice-samples folder.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Concern

ASI09: Human-Agent Trust Exploitation

What this means

A user or agent could generate speech that appears to come from a real person who did not consent.

Why it was flagged

The primary description frames cloning other people's or celebrity voices as an intended use, which can create audio that listeners may trust as authentic.

Skill content

Clone their voice or someone else's voice... Works with: Any voice! Yours, a celebrity, a character, etc.

Recommendation

Use only with explicit consent, add clear labeling/watermarking guidance, and require the agent to verify consent before cloning a third party's voice.

Concern

ASI02: Tool Misuse and Exploitation

What this means

Cloned voice audio could be sent externally before the user reviews the exact file, recipient, or impersonation risk.

Why it was flagged

The quick-reference workflow makes sending the generated cloned voice to Telegram part of the agent action sequence, without a separate confirmation or recipient-scoping step.

Skill content

→ Run: VOICE=$(...clonev.sh "hello" "/path/to/sample.wav" en)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

Recommendation

Default to saving the generated audio locally and require explicit user approval, destination, and context before sending it through any messaging channel.

Concern

ASI06: Memory and Context Poisoning

What this means

Sensitive voice samples may remain on disk and could be reused in later runs or confused with another person's sample by filename collision.

Why it was flagged

The script copies the user-provided voice sample into a persistent hard-coded directory and does not delete it; if a filename already exists, it reuses the old copy instead of the current file.

Skill content

if [ ! -f "${COQUI_DIR}/voice-samples/${SAMPLE_BASENAME}" ]; then
    cp "$VOICE_SAMPLE" "${COQUI_DIR}/voice-samples/"
fi

Recommendation

Use per-run temporary sample paths, clean up voice samples by default, avoid basename reuse, and clearly disclose any retention option to the user.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Future runs may execute different container code than the code reviewed here.

Why it was flagged

The skill runs an external Docker image tagged latest, so the executed runtime can change over time; this is purpose-aligned for XTTS but should be pinned and disclosed.

Skill content

docker run --rm --entrypoint "" ... ghcr.io/coqui-ai/tts:latest ...

Recommendation

Pin the container image by version or digest and declare Docker, ffmpeg, model download size, and trusted image provenance in the install requirements.