Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

qwenspeak

v1.5.0

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...

0· 897· 6 versions· 3 current· 3 all-time· Updated 9h ago· MIT-0
byCiprian Mandache@psyb0t

Install

openclaw skills install qwenspeak

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queuedrunningcompleted | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

CommandDescription
put <path>Upload file from stdin
get <path>Download file to stdout
list-files [--json]List directory
remove-file <path>Delete a file
create-dir <path>Create directory
remove-dir <path>Remove empty directory
move-file <src> <dst>Move or rename
copy-file <src> <dst>Copy a file
file-exists <path>Check if file exists (true/false)
search-files <glob>Glob search (** recursive)

Speakers

SpeakerGenderLanguageDescription
VivianFemaleChineseBright, slightly edgy young voice
SerenaFemaleChineseWarm, gentle young voice
Uncle_FuMaleChineseSeasoned, low mellow timbre
DylanMaleChineseYouthful Beijing dialect, clear natural timbre
EricMaleChineseLively Chengdu/Sichuan dialect, slightly husky
RyanMaleEnglishDynamic with strong rhythmic drive
AidenMaleEnglishSunny American, clear midrange
Ono_AnnaFemaleJapanesePlayful, light nimble timbre
SoheeFemaleKoreanWarm with rich emotion

YAML Options

All settings cascade: global > step > generation.

FieldDefaultDescription
dtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)
flash_attnautoFlashAttention-2: auto-detects, auto-switches float32→bfloat16
temperature0.9Sampling temperature
top_k50Top-k sampling
top_p1.0Top-p / nucleus sampling
repetition_penalty1.05Repetition penalty
max_new_tokens2048Max codec tokens to generate
no_samplefalseGreedy decoding
streamingfalseStreaming mode (lower latency)
moderequiredStep only: custom-voice, voice-design, or voice-clone
model_size1.7bStep only: 1.7b or 0.6b
textrequiredText to synthesize
outputrequiredOutput file path
speakerViviancustom-voice: speaker name
languageAutoLanguage for synthesis
instruct-custom-voice: emotion/style; voice-design: voice description
ref_audio-voice-clone: reference audio file path
ref_text-voice-clone: transcript of reference audio
x_vector_onlyfalsevoice-clone: use speaker embedding only

Version tags

latestvk97042c51887v8sne1z0kyrkcx81be0g