Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Voice Clone Bot

Design and build a fully local Telegram voice-clone bot that replies in a chosen speaker voice, including model selection, ASR/LLM/TTS pipeline design, long-...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 16 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill's purpose (a Telegram voice-clone bot) is plausible, but the metadata declares no required environment variables or binaries even though the instructions explicitly rely on a Telegram bot token, ffmpeg, Python bot libraries, and local model files. At minimum a TELEGRAM_BOT_TOKEN (or equivalent) and runtime dependencies should be declared.
Instruction Scope
SKILL.md stays on-topic (ASR → LLM → TTS → Telegram) and explicitly recommends keeping everything local. However it omits essential operational steps: where to place the Telegram token, how to obtain or verify local model artifacts/licenses, exact ffmpeg usage, and any consent safeguards for voice cloning. The omission leaves broad discretion to the implementer and operational gaps.
!
Install Mechanism
This is instruction-only (no install spec), which is low-risk in isolation. But the instructions require installing heavy binaries/models (ffmpeg, faster-whisper, quantized Qwen models, Qwen3-TTS/MOSS-TTS/OpenVoice) with no guidance about sources or verification. That missing install/source guidance raises supply-chain risk (unexpected downloads, untrusted model binaries).
!
Credentials
The skill declares no required env vars or config paths, yet operating a Telegram bot requires a bot token and typical deployments need paths for model files and possibly API keys or license files for certain TTS/LLM distributions. The declared zero-env configuration is not proportional to the described runtime needs.
Persistence & Privilege
The skill is not marked always:true, requests no special persistent privileges, and does not attempt to modify other skills or system-wide agent settings in the provided instructions.
What to consider before installing
Before installing or using this skill, ask the author for missing operational details and a safe runbook: which environment variables are required (e.g., TELEGRAM_BOT_TOKEN), exactly which binaries and versions to install (ffmpeg, Python libs), and where to obtain the local model files (and their licenses). Treat model downloads as potentially risky — prefer official release hosts and verify checksums. Because this skill enables voice cloning, consider legal and ethical risks (obtain consent from speakers, avoid misuse), store the Telegram token and any model licenses securely, and test in an isolated environment (VM/container) before deploying into production or a personal account.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97a6grk2em33f2p8333y7py2s83zhcc

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Voice Clone Bot

Overview

Build a local Telegram bot that turns text or voice input into replies spoken in the target speaker's voice. Keep the pipeline fully local unless the user explicitly asks otherwise.

Core pipeline

  1. Receive a Telegram message.
  2. If the message is voice, transcribe it locally with ASR.
  3. Generate the reply text with a local LLM.
  4. Synthesize speech with a local voice-clone TTS model.
  5. Return the result as Telegram voice (.ogg/Opus) or audio.

Model selection

Use this order unless the hardware or language target changes it:

  1. Qwen3-TTS — default choice for balanced quality, speed, and local deployment.
  2. MOSS-TTS Local / Realtime — choose for longer, more stable dialogue generation.
  3. OpenVoice V2 — use as the lighter fallback when resources are tight.

For English-only low-resource setups, NeuTTS can be a fast fallback, but it is not the default for Chinese-first use.

Build workflow

1. Define the target behavior

  • Decide whether the bot answers with text only, voice only, or both.
  • Keep the bot identity explicit; do not imply it is the user's personal account.
  • Decide the reply length policy: short chat, medium chat, or long-form narration.

2. Prepare voice material

  • Start with 3–5 minutes of clean reference audio.
  • Prefer 10–20 minutes for better stability.
  • Use the same microphone and a quiet room.
  • Include short, long, formal, and casual sentences.

3. Implement generation

  • Transcribe voice input with faster-whisper or a similar local ASR.
  • Use a local LLM for reply text.
  • Split long replies into short segments before TTS.
  • Reuse one reference voice prompt across segments.
  • Insert short pauses between segments for natural cadence.

4. Deliver to Telegram

  • Prefer Telegram voice messages for the most natural experience.
  • Convert final audio to OGG/Opus.
  • Fall back to MP3/WAV for debugging or long-form output.

Long-form synthesis rules

  • Do not feed a 2-minute reply as one huge block unless the model explicitly supports it well.
  • Break by sentence or clause.
  • Keep punctuation intact.
  • Reassemble audio after synthesis.
  • Use the same speaker reference for every segment.

Practical defaults

  • Prefer a small local LLM first; upgrade only if reply quality is weak.
  • Prefer Qwen3-TTS first; switch to MOSS-TTS when long-form stability matters more than simplicity.
  • Use OpenVoice V2 when the machine is underpowered.

Output quality checklist

  • Voice sounds like the target speaker.
  • Reply latency is acceptable in chat.
  • Long replies do not drift or become robotic.
  • Telegram playback works as a normal voice note.
  • No remote TTS/LLM API is required unless the user asks for it.

References

Read references/architecture.md for the recommended system layout and fallback choices.

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…