qwenspeak

v1.5.0

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...

⭐ 0· 878·3 current·3 all-time

byCiprian Mandache@psyb0t

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for psyb0t/qwenspeak.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "qwenspeak" (psyb0t/qwenspeak) from ClawHub.
Skill page: https://clawhub.ai/psyb0t/qwenspeak
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install qwenspeak

ClawHub CLI

Package manager switcher

npx clawhub@latest install qwenspeak

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (Qwen3-TTS over SSH) matches the included script and commands. However, registry metadata claims no required env vars while SKILL.md and scripts clearly require QWENSPEAK_HOST and QWENSPEAK_PORT and rely on SSH keys; this metadata mismatch is inconsistent and should have been declared.

Instruction Scope

Runtime instructions direct the agent to interact with a remote host over SSH (tts@host), upload/download arbitrary files (put/get), and create reference audio. That is coherent for a TTS client, but it implicitly requires access to the user's SSH private key(s) and network access to the target host. The setup instructions also advise appending your public key to the server's authorized_keys. The skill permits file transfers which, if misused or combined with an untrusted remote host, could exfiltrate local data.

Install Mechanism

There is no formal install spec for the skill, but references/setup.md recommends running a remote installer via curl -fsSL https://raw.githubusercontent.com/psyb0t/docker-qwenspeak/main/install.sh | sudo bash. Download-and-pipe-to-sudo is high-risk: it writes files, manages authorized_keys, and installs a system command. The URL is a raw GitHub URL (better than an unknown personal server) but running arbitrary remote scripts as root should be reviewed manually before execution.

Credentials

Although the registry lists no required env vars, the SKILL.md and scripts require QWENSPEAK_HOST and QWENSPEAK_PORT; the setup references many QWENSPEAK_* env vars and persistence to ~/.qwenspeak/. The skill also implicitly requires the user's SSH private key (e.g., ~/.ssh) to authenticate to the remote service. Sensitive accesses (private keys, potential SSH agent use) are not declared in the registry metadata.

✓

Persistence & Privilege

The skill does not request always:true and does not appear to modify other skills or global agent configuration. It is user-invocable and allows autonomous invocation (the platform default), which is expected for skills.

What to consider before installing

This skill is coherent with being an SSH-based TTS client, but it has several red flags you should address before using it: - It requires network access and an SSH identity: the included wrapper will call ssh tts@QWENSPEAK_HOST and therefore needs access to your SSH private key or agent. Consider creating a dedicated SSH keypair for this service and restricting that key on the server (command restrictions, limited account). - The registry metadata omits required env vars (QWENSPEAK_HOST, QWENSPEAK_PORT); treat those as required. Expect other QWENSPEAK_* settings on the server side. - references/setup.md suggests running a remote install script via curl | sudo bash. Do NOT run that as-is without reviewing the script content. Prefer to inspect the repository, clone it locally, and run only the commands you understand. Avoid piping unknown scripts to sudo. - The skill exposes put/get file operations. If you give an agent this skill plus file system access, it could upload local files to the remote host. Limit the agent's file-access scope and ensure the remote host is trusted and isolated. If you decide to proceed: review the GitHub install script before running, use a dedicated SSH key with restricted server-side permissions, host the QWENSPEAK instance on infrastructure you control or trust, and confirm that the registry metadata is updated to declare required env vars and any needed config paths.

Like a lobster shell, security has layers — review code before you run it.

latestvk97042c51887v8sne1z0kyrkcx81be0g

878downloads

0stars

6versions

Updated 13h ago

v1.5.0

MIT-0

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav

steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queued → running → completed | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

Command	Description
`put <path>`	Upload file from stdin
`get <path>`	Download file to stdout
`list-files [--json]`	List directory
`remove-file <path>`	Delete a file
`create-dir <path>`	Create directory
`remove-dir <path>`	Remove empty directory
`move-file <src> <dst>`	Move or rename
`copy-file <src> <dst>`	Copy a file
`file-exists <path>`	Check if file exists (true/false)
`search-files <glob>`	Glob search (`**` recursive)

Speakers

Speaker	Gender	Language	Description
Vivian	Female	Chinese	Bright, slightly edgy young voice
Serena	Female	Chinese	Warm, gentle young voice
Uncle_Fu	Male	Chinese	Seasoned, low mellow timbre
Dylan	Male	Chinese	Youthful Beijing dialect, clear natural timbre
Eric	Male	Chinese	Lively Chengdu/Sichuan dialect, slightly husky
Ryan	Male	English	Dynamic with strong rhythmic drive
Aiden	Male	English	Sunny American, clear midrange
Ono_Anna	Female	Japanese	Playful, light nimble timbre
Sohee	Female	Korean	Warm with rich emotion

YAML Options

All settings cascade: global > step > generation.

Field	Default	Description
`dtype`	`float32`	float32, float16, bfloat16 (float16/bfloat16 GPU only)
`flash_attn`	`auto`	FlashAttention-2: auto-detects, auto-switches float32→bfloat16
`temperature`	`0.9`	Sampling temperature
`top_k`	`50`	Top-k sampling
`top_p`	`1.0`	Top-p / nucleus sampling
`repetition_penalty`	`1.05`	Repetition penalty
`max_new_tokens`	`2048`	Max codec tokens to generate
`no_sample`	`false`	Greedy decoding
`streaming`	`false`	Streaming mode (lower latency)
`mode`	required	Step only: `custom-voice`, `voice-design`, or `voice-clone`
`model_size`	`1.7b`	Step only: `1.7b` or `0.6b`
`text`	required	Text to synthesize
`output`	required	Output file path
`speaker`	`Vivian`	custom-voice: speaker name
`language`	`Auto`	Language for synthesis
`instruct`	-	custom-voice: emotion/style; voice-design: voice description
`ref_audio`	-	voice-clone: reference audio file path
`ref_text`	-	voice-clone: transcript of reference audio
`x_vector_only`	`false`	voice-clone: use speaker embedding only

Comments

Loading comments...