qwen-audio-lab

v0.0.1

Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech,...

⭐ 0· 205·0 current·0 all-time

by@aliyx

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for aliyx/qwen-audio-lab.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "qwen-audio-lab" (aliyx/qwen-audio-lab) from ClawHub.
Skill page: https://clawhub.ai/aliyx/qwen-audio-lab
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install qwen-audio-lab

ClawHub CLI

Package manager switcher

npx clawhub@latest install qwen-audio-lab

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

high confidence

ℹ

Purpose & Capability

The name/description (macOS + Aliyun Qwen TTS, voice cloning, narrated PPTs) matches what the code and SKILL.md implement: local 'say' playback, Qwen TTS calls, voice cloning/design endpoints, and local storage of outputs and remembered voices. However, the registry metadata lists no required environment variables or primary credential while both SKILL.md and the code require DASHSCOPE_API_KEY — this metadata omission is an incoherence to be aware of.

✓

Instruction Scope

The SKILL.md instructions and the included script remain focused on TTS/voice workflows. They reference only task-relevant files/paths (user home ~/.openclaw/data/qwen-audio-lab for outputs/state), optional ffmpeg for trimming, and network calls to DashScope (Aliyun) APIs. There is no instruction to read unrelated system files, shell history, or to exfiltrate arbitrary data.

✓

Install Mechanism

This is an instruction-only skill with an included Python script and no install spec; nothing is downloaded from external URLs during install. Runtime will execute local scripts and may call external network endpoints. No archive downloads or remote installers were specified.

Credentials

The code and SKILL.md require DASHSCOPE_API_KEY (plus optional QWEN_AUDIO_REGION, QWEN_AUDIO_OUTPUT_DIR, QWEN_AUDIO_STATE_DIR), but the registry metadata declared no required env vars or primary credential. This mismatch is concerning because the skill needs an API key to access remote TTS/voice-cloning services; the package should declare that requirement explicitly. Aside from the missing declaration, the environment access requested by the script (API key + optional dirs) is proportionate to the stated purpose.

✓

Persistence & Privilege

The skill does not request always:true and does not modify other skills or global configs. It writes state and outputs under ~/.openclaw/data/qwen-audio-lab (its own directory) which is normal for persistent skill state.

What to consider before installing

What to consider before installing: - The skill does what it claims (local macOS 'say' + remote Qwen/DashScope TTS and voice-clone). However, the package metadata did NOT declare the required DASHSCOPE_API_KEY even though SKILL.md and the script require it — treat that as a red flag (metadata should match runtime requirements). - The script will make network calls to DashScope endpoints (https://dashscope.aliyuncs.com and https://dashscope-intl.aliyuncs.com). Only provide an API key if you trust the endpoint and the skill source. - The skill stores outputs and remembered-voice state under ~/.openclaw/data/qwen-audio-lab; verify you are comfortable with that directory being created/written. - For some operations (audio trimming) ffmpeg is required, and local playback uses macOS 'say' — these are normal but will invoke subprocesses. - Voice cloning can have legal/consent implications. The SKILL.md recommends asking for permission; you should enforce that policy yourself before cloning third-party voices. - Because the skill source is 'unknown' and the registry metadata is inconsistent, prefer to inspect the full script locally (ensure the truncated portion contains only TTS/manage-voice logic) or obtain the skill from a trusted publisher before supplying credentials. If you proceed, limit the scope/permissions of the API key (if possible) and monitor network activity.

Like a lobster shell, security has layers — review code before you run it.

latestvk97693aaer1yyn13vbnrnm9x4h8361hw

205downloads

0stars

1versions

Updated 3h ago

v0.0.1

MIT-0

Qwen Audio Lab

Use this skill for text-to-speech on macOS or with Aliyun Qwen.

Choose the backend

Use mac-say for fast local playback, notifications, and low-friction speech on a Mac.
Use qwen-tts when the user wants better naturalness, reusable output files, custom voices, or voice cloning.
If DASHSCOPE_API_KEY is missing, fall back to mac-say for local playback.

Environment

DASHSCOPE_API_KEY: required for Qwen synthesis and voice cloning.
QWEN_AUDIO_REGION: optional, cn (default) or intl.
QWEN_AUDIO_OUTPUT_DIR: optional directory for generated audio files. Defaults to ~/.openclaw/data/qwen-audio-lab/output.
QWEN_AUDIO_STATE_DIR: optional directory for local state such as remembered voices. Defaults to ~/.openclaw/data/qwen-audio-lab/state.

Commands

Run all commands through:

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py <command> [...]

Preferred high-level commands

Use these first for most user-facing narration tasks:

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text --text "这是要转成语音的正文"
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file --text-file /path/to/script.txt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx

Use the older commands only when you specifically want the legacy workflow names. Generated audio and remembered voice state now default to ~/.openclaw/data/qwen-audio-lab/ instead of the skill folder.

Local macOS speech

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py mac-say \
  --text "开会了，别忘了带电脑" \
  --voice Tingting

Qwen TTS from inline text

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --text "你好，我是你的语音助手。" \
  --voice Cherry \
  --model qwen3-tts-flash \
  --language-type Chinese \
  --download

Qwen TTS from a text file

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --text-file /path/to/script.txt \
  --voice Cherry \
  --download

Qwen TTS from stdin

cat /path/to/script.txt | python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --stdin \
  --voice Cherry \
  --download

Clone a voice

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py clone-voice \
  --audio /path/to/reference.mp3 \
  --name claw-voice-01 \
  --target-model qwen3-tts-vc-2026-01-22

Keep the cloning target-model aligned with the synthesis model family.
Use a clean speech sample with minimal background noise.
Ask before cloning a third party voice when consent is unclear.

Design a voice from a text prompt

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py design-voice \
  --prompt "沉稳的中年男性播音员，音色低沉浑厚，适合纪录片旁白。" \
  --name doc-voice-01 \
  --target-model qwen3-tts-vd-2026-01-26 \
  --preview-format wav

Legacy command: reuse the latest cloned voice

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py speak-last-cloned \
  --text "你好，这是我的声音测试。" \
  --download

High-level narration from any text source

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text \
  --text "这是要转成语音的正文" \
  --output narration.wav

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file \
  --text-file /path/to/script.txt

Default voice source is last-cloned.
Use --voice-source last-designed to use the latest designed voice instead.
Use --voice and optionally --model to force a specific voice id and synthesis model.

Legacy command: narrate PPT speaker notes with the latest cloned voice

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py ppt-own-voice   --ppt "/path/to/file.pptx"

High-level PPT narration

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt   --ppt "/path/to/file.pptx"

Default voice source is last-cloned.
Use --voice-source last-designed to switch to the latest designed voice.
Use --voice and optionally --model to force a specific voice id and synthesis model.
Keep ppt-own-voice as the backward-compatible alias for the original workflow.

Inspect or manage remembered voices

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py list-voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py show-last-voice --kind cloned
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py delete-voice --voice claw-voice-01

Workflow rules

Reuse an existing cloned voice before asking for a new sample.
Ask for a reference recording if the user wants their own voice and no cloned voice exists yet.
Prefer the narrate-* commands as the primary high-level interface for narration tasks.
Keep speak-last-cloned and ppt-own-voice for backward compatibility with older workflows.
Keep only final outputs by default after segmented synthesis unless the user explicitly asks to keep fragments.

Comments

Loading comments...