Polyphone TTS

PassAudited by ClawScan on May 1, 2026.

Overview

This instruction-only skill is coherent for fixing Chinese TTS pronunciation, but users should notice it sends text to SenseAudio using an API key and writes local audio/response files.

This skill appears benign and purpose-aligned. Before installing or using it, make sure you trust SenseAudio with the text you synthesize, protect your API key, and only use cloned voices you have permission to use.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Running the workflow will contact an external service and create local output files.

Why it was flagged

The skill instructs use of local command-line tools to call the SenseAudio API and write decoded audio locally; this is central to the TTS purpose and is disclosed.

Skill content
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 ... jq -r '.data.audio' response.json | xxd -r -p > output.mp3
Recommendation

Review the text, voice ID, and output filenames before running the command, especially if the text is sensitive.

What this means

The skill can use the user's SenseAudio account quota or permissions to synthesize speech with a provided cloned voice.

Why it was flagged

The skill requires a SenseAudio API key and a cloned voice ID to access the TTS service; this is expected for the integration and is clearly documented.

Skill content
requires: env: - SENSEAUDIO_API_KEY ... Authorization: Bearer $SENSEAUDIO_API_KEY ... "voice_id": "<CLONED_VOICE_ID>"
Recommendation

Use an appropriately scoped API key if available, keep it secret, and only use cloned voice IDs you are authorized to use.

What this means

Text submitted for synthesis and related pronunciation annotations are shared with the external TTS provider.

Why it was flagged

The API request sends the user's text, voice identifier, and pronunciation dictionary to the SenseAudio provider; this external data flow is necessary for the stated TTS function.

Skill content
"text": "<TEXT>", ... "voice_id": "<CLONED_VOICE_ID>", ... "dictionary": <DICTIONARY_ARRAY>
Recommendation

Avoid submitting confidential or regulated text unless SenseAudio's handling and retention policies are acceptable for your use case.