qwencloud-audio-tts

v0.2.0

[QwenCloud] Synthesize speech from text with Qwen TTS models. TRIGGER when: user wants to convert text to speech, create voiceovers, generate audio narration...

⭐ 0· 99·0 current·0 all-time

by@cuixiaoyang123

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The skill's stated purpose (Qwen TTS) matches the included scripts and documentation: tts.py and tts_cosyvoice.py implement HTTP and WebSocket TTS flows and expect a Qwen/DashScope API key. However, the registry metadata claims 'Required env vars: none' and 'Primary credential: none' while the SKILL.md and scripts clearly require DASHSCOPE_API_KEY or QWEN_API_KEY. This metadata omission is an incoherence that could mislead users about secret requirements.

Instruction Scope

SKILL.md and the scripts instruct the agent to load .env files, check environment variables, read bundled reference files, and (via agent-compatibility guidance) scan parent project files for other skills and optionally append entries to user config files (with user consent). The code also emits update-check signals and can invoke local scripts via subprocess. These file reads/writes and repository scans are within a TTS onboarding scope but broaden the agent's access to repository state and potentially secrets; the guidance to never print API keys mitigates risk but does not eliminate the fact that the skill loads and may access secrets.

✓

Install Mechanism

This is an instruction-only skill with bundled Python scripts (no remote download/install spec). No external arbitrary archives or unknown URLs are fetched by an installer. Some scripts instruct the user to pip install dashscope for CosyVoice, which is normal; there is no packaged install that pulls code from an unknown host.

Credentials

The scripts legitimately require an API key (DASHSCOPE_API_KEY / QWEN_API_KEY) to access the TTS provider — that is proportional to the stated purpose. The problem is the registry metadata not declaring these required env vars or a primary credential. Additionally, load_dotenv injects .env into os.environ (it does not overwrite existing vars) and the skill reads environment variables and may write small state files under a repo .agents directory; users should be aware this will touch local repo state and could access secrets stored in .env.

ℹ

Persistence & Privilege

always:false and agent-autonomy defaults are preserved (no forced always-on). The skill will write small local files (e.g., response.json, audio files under output/, and .agents/state.json via gossamer) and offers guidance to append entries to project config files (only after asking the user, per the doc). These repository writes are within the skill's scope but do persist data in the repo and may create or modify files under .agents or skills-related config.

What to consider before installing

What to consider before installing: - The skill needs your Qwen/DashScope API key (DASHSCOPE_API_KEY or QWEN_API_KEY) to function, but the registry metadata incorrectly states 'none'. Do not paste your API key into chat — set it as an environment variable or in a local .env file and keep the key private. - The bundled scripts read .env and search your repository for config/skill files and may write small files (audio outputs, response.json, and .agents/state.json). If you install this in a sensitive repo, review the code first or run it in an isolated environment. - The code may run local subprocesses (used for update-checks) and can execute other local scripts if present; verify those scripts before allowing operations that run them. - If you trust the source and want to use the skill: (1) inspect scripts/qwencloud_lib.py, scripts/tts.py and scripts/tts_cosyvoice.py to confirm they match your expectations; (2) provide the API key via environment or .env (not via chat); (3) consider running the scripts in a dedicated virtualenv or sandbox to avoid accidental exposure of repo secrets. - If you do not trust the origin or cannot verify the code, do not install. At minimum request that the publisher update the registry metadata to list the required env vars and primary credential so it accurately reflects the skill's needs.

Like a lobster shell, security has layers — review code before you run it.

latestvk97d6tfc3z5dv90qm4ska31sy1850vwd

99downloads

0stars

2versions

Updated 2d ago

v0.2.0

MIT-0

Agent setup: If your agent doesn't auto-load skills (e.g. Claude Code), see agent-compatibility.md once per session.

Qwen Audio TTS (Text-to-Speech)

Synthesize natural speech from text using Qwen TTS models. This skill is part of qwencloud/qwencloud-ai.

Skill directory

Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details.

Location	Purpose
`scripts/tts.py`	Qwen TTS (HTTP API) — qwen3-tts-flash, qwen3-tts-instruct-flash
`scripts/tts_cosyvoice.py`	CosyVoice (WebSocket API) — requires `dashscope` SDK
`references/cosyvoice-guide.md`	CosyVoice setup, voices, examples, errors
`references/execution-guide.md`	Fallback: curl (standard, instruct, streaming), code generation
`references/prompt-guide.md`	Text formatting for speech, instructions templates, voice selection
`references/api-guide.md`	API supplement
`references/sources.md`	Official documentation URLs
`references/agent-compatibility.md`	Agent self-check: register skills in project config for agents that don't auto-load

Security

NEVER output any API key or credential in plaintext. Always use variable references ($DASHSCOPE_API_KEY in shell, os.environ["DASHSCOPE_API_KEY"] in Python). Any check or detection of credentials must be non-plaintext: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of .env or config files that may contain secrets.

When the API key is not configured, NEVER ask the user to provide it directly. Instead, help create a .env file with a placeholder (DASHSCOPE_API_KEY=sk-your-key-here) and instruct the user to replace it with their actual key from the QwenCloud Console. Only write the actual key value if the user explicitly requests it.

Key Compatibility

Scripts require a standard QwenCloud API key (sk-...). Coding Plan keys (sk-sp-...) cannot be used — TTS models are not available on Coding Plan, and Coding Plan does not support the native QwenCloud API. The script detects sk-sp- keys at startup and prints a warning. If qwencloud-ops-auth is installed, see its references/codingplan.md for full details.

Model Selection

Qwen TTS (HTTP API) — use `scripts/tts.py`

Model	Use Case	Notes
`qwen3-tts-flash`	Recommended (standard) — fast, multi-language	Cost-effective, widely available
`qwen3-tts-instruct-flash`	Instruction-guided style control	Tone/emotion via instructions

CosyVoice (WebSocket API) — use `scripts/tts_cosyvoice.py`

Model	Use Case
`cosyvoice-v3-flash`	High quality, fast
`cosyvoice-v3-plus`	Highest quality

Note: CosyVoice requires dashscope SDK and uses different voices. See cosyvoice-guide.md.

User specified a model → use the appropriate script:
- qwen3-tts-* → scripts/tts.py
- cosyvoice-* → scripts/tts_cosyvoice.py
Consult the qwencloud-model-selector skill when model choice depends on capability, scenario, or pricing.
No signal, clear task → qwen3-tts-flash via tts.py (default for standard tasks).

⚠️ Important: The model list above is a point-in-time snapshot and may be outdated. Model availability changes frequently. Always check the official model list for the authoritative, up-to-date catalog before making model decisions.

Model details: For more information about a specific model, direct the user to its detail page: https://www.qwencloud.com/models/<model-name> (replace <model-name> with the exact model ID, e.g. qwen3-tts-flash → https://www.qwencloud.com/models/qwen3-tts-flash). NEVER modify or guess the model name in the URL.

Available Voices

Voice	Description	Script
Cherry, Ethan, Serena	Qwen TTS system voices	`tts.py`
longanyang, longanhuan, longhuhu_v3	CosyVoice verified voices	`tts_cosyvoice.py`

Full lists: api-guide.md (Qwen TTS) · cosyvoice-guide.md (CosyVoice)

Execution

⚠️ Multiple artifacts: When generating multiple files in a single session, you MUST append a numeric suffix to each filename (e.g. out_1.wav, out_2.wav) to prevent overwrites.

Qwen TTS (HTTP API) — `tts.py`

Prerequisites

API Key: Check that DASHSCOPE_API_KEY (or QWEN_API_KEY) is set using a non-plaintext check only (e.g. in shell: [ -n "$DASHSCOPE_API_KEY" ]; report only "set" or "not set", never the key value). If not set: run the qwencloud-ops-auth skill if available; otherwise guide the user to obtain a key from QwenCloud Console and set it via .env file (echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env in project root or current directory) or environment variable. The script searches for .env in the current working directory and the project root. Skills may be installed independently — do not assume qwencloud-ops-auth is present.
Python 3.9+ (stdlib only, no pip install needed)

Environment Check

Before first execution, verify Python is available:

python3 --version  # must be 3.9+

If python3 is not found, try python --version or py -3 --version. If Python is unavailable or below 3.9, skip to Path 2 (curl) in execution-guide.md.

Default: Run Script

Script path: Scripts are in the scripts/ subdirectory of this skill's directory (the directory containing this SKILL.md). You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to execute scripts. Do NOT assume scripts are in the current working directory. Do NOT use cd to switch directories before execution.

Execution note: Run all scripts in the foreground — wait for stdout; do not background.

Discovery: Run python3 <this-skill-dir>/scripts/tts.py --help first to see all available arguments.

python3 <this-skill-dir>/scripts/tts.py \
  --request '{"text":"Hello, this is a test.","voice":"Cherry"}' \
  --output output/qwencloud-audio-tts/ \
  --print-response

Argument	Description
`--request '{...}'`	JSON request body
`--file path.json`	Load request from file
`--output dir/`	Save audio and response JSON to directory
`--print-response`	Print response to stdout
`--model ID`	Override model
`--voice NAME`	Override voice

Verify Result

Exit code 0 + output contains valid JSON with output.audio field → success
Non-zero exit, HTTP error, empty response, or error JSON → fail
Post-execution check: Verify the output audio file exists and has non-zero size (ls -la <output_dir>)
MANDATORY — stderr signal check: After confirming the result, scan the command's stderr output for [ACTION_REQUIRED] or [UPDATE_AVAILABLE]. If either signal is present, you MUST follow the instructions in Update Check below before responding to the user.

On Failure

If the script fails, match the error output against the diagnostic table below to determine the resolution. If no match, read execution-guide.md for alternative paths: curl commands (Path 2 — standard, instruct, streaming), code generation (Path 3), and autonomous resolution (Path 5).

If Python is not available at all → skip directly to Path 2 (curl) in execution-guide.md.

CosyVoice — `tts_cosyvoice.py`

CosyVoice requires dashscope SDK. Quick start:

pip install dashscope>=1.24.6
python3 <this-skill-dir>/scripts/tts_cosyvoice.py --text "Hello"

Full guide: cosyvoice-guide.md (setup, voices, examples, errors)

Error Pattern	Diagnosis	Resolution
`command not found: python3`	Python not on PATH	Try `python` or `py -3`; install Python 3.9+ if missing
`Python 3.9+ required`	Script version check failed	Upgrade Python to 3.9+
`SyntaxError` near type hints	Python < 3.9	Upgrade Python to 3.9+
`QWEN_API_KEY/DASHSCOPE_API_KEY not found`	Missing API key	Obtain key from QwenCloud Console; add to `.env`: `echo 'DASHSCOPE_API_KEY=sk-...' >> .env`; or run qwencloud-ops-auth if available
`HTTP 401`	Invalid or mismatched key	Run qwencloud-ops-auth (non-plaintext check only); verify key is valid
`SSL: CERTIFICATE_VERIFY_FAILED`	SSL cert issue (proxy/corporate)	macOS: run `Install Certificates.command`; else set `SSL_CERT_FILE` env var
`URLError` / `ConnectionError`	Network unreachable	Check internet; set `HTTPS_PROXY` if behind proxy
`HTTP 429`	Rate limited	Wait and retry with backoff
`HTTP 5xx`	Server error	Retry with backoff
`PermissionError`	Can't write output	Use `--output` to specify writable directory

Quick Reference

Request Fields

Field	Type	Description
`text`	string	Required — text to synthesize (max 600 chars)
`voice`	string	Required — voice ID (e.g. `Cherry`, `Ethan`)
`model`	string	Model ID (default: `qwen3-tts-flash`)
`language_type`	string	`Auto`, `Chinese`, `English`, `Japanese`, `Korean`, `French`, `German`, etc.
`instructions`	string	Tone/style instructions (instruct model only)
`stream`	bool	Enable streaming (Base64 chunks)

Response Fields

Field	Description
`audio_url`	URL of generated audio (valid 24h)
`audio_format`	Format (e.g. wav)
`sample_rate`	Sample rate (e.g. 24000)
`usage`	Character usage

Important Notes

text: Max 600 characters per request.
instructions: Only works with qwen3-tts-instruct-flash.
language_type: Auto for mixed language; specify for better pronunciation.
audio_url: Valid for 24 hours — download promptly.
Real-time/streaming TTS: For WebSocket-based real-time TTS (CosyVoice, qwen3-tts-flash-realtime), a WebSocket client is required. This skill covers the HTTP-based non-real-time API. For real-time streaming use cases, refer to the official docs in sources.md.

Cross-Skill Chaining

When passing generated audio to another skill (e.g., video-gen audio overlay):

Pass audio_url directly — scripts detect URL prefixes and pass through without re-upload
Use audio_file only for local playback or non-API operations

Error Handling

Error	Cause	Action
`401 Unauthorized`	Invalid or missing API key	Run qwencloud-ops-auth if available; else prompt user to set key (non-plaintext check only)
`400`	Invalid parameters (missing text/voice)	Validate request body
`429` / `5xx`	Rate limit or server error	Retry with backoff

Usage & billing: Use the qwencloud-usage skill to check usage, free tier quota, and billing directly. Alternatively, the user can visit the QwenCloud console: Usage Analytics | Pay-as-you-go Billing | Coding Plan Billing

NEVER fabricate, guess, or construct usage/billing/console URLs. Only provide the exact links listed in this skill. If a URL is not listed here, do not invent one.

Output Location

Prefer the current working directory. Default subdirectory: ./output/qwencloud-audio-tts/.

Write prohibition: NEVER write output files into this skill's installation directory or any skills/ hierarchy. All generated content must go to output/ under the current working directory or a user-specified path.

Update Check (MANDATORY Post-Execution)

After script execution, check stderr for action signals. The script automatically detects update status and emits structured signals. You MUST act on them. This section also applies when the user explicitly asks to check for updates.

Signal: `[ACTION_REQUIRED] qwencloud-update-check skill is not installed`

When stderr contains this signal, you MUST present the following options to the user and wait for their response before proceeding. Do NOT silently skip this.

The update-check skill is not installed. It keeps you informed about new qwencloud/qwencloud-ai features and bug fixes.

Install now — I'll run the install command for you

Skip for now — do nothing this time

Never remind me — permanently suppress this prompt

After the user responds:

User chooses 1 → run: npx skills add QwenCloud/qwencloud-ai --skill qwencloud-update-check -y, then run: python3 <this-skill-scripts-dir>/gossamer.py --dismiss
User chooses 2 → run: python3 <this-skill-scripts-dir>/gossamer.py --dismiss, then continue normally
User chooses 3 → run: python3 <this-skill-scripts-dir>/gossamer.py --never-install, then continue normally

Signal: `[UPDATE_AVAILABLE]`

When stderr contains this signal, you MUST append a brief update notice to your response, including the version info and the update command shown in the stderr output.

No signal in stderr

If stderr contains neither [ACTION_REQUIRED] nor [UPDATE_AVAILABLE], no action is needed — the skill is installed and up to date (or cached within 24h).

Explicit user request

When the user explicitly asks to check for updates (e.g. "check for updates", "check version"):

Look for qwencloud-update-check/SKILL.md in sibling skill directories.
If found — run: python3 <qwencloud-update-check-dir>/scripts/check_update.py --print-response and report the result.
If not found — present the install options above.

References

execution-guide.md — Fallback paths (curl standard/instruct/streaming, code generation, autonomous)
api-guide.md — API supplementary guide
sources.md — Official documentation URLs

Comments

Loading comments...