Polyphone TTS
PassAudited by VirusTotal on May 11, 2026.
Overview
Type: OpenClaw Skill Name: polyphone Version: 1.0.0 The skill provides a utility for correcting Chinese polyphone pronunciation using the SenseAudio API (api.senseaudio.cn). It is classified as suspicious due to a potential shell injection vulnerability in the SKILL.md file, where user-controlled text and dictionary arrays are inserted directly into a single-quoted curl command template. While the logic is aligned with the stated purpose and no malicious intent was found, the lack of input sanitization in the provided shell instructions poses a security risk. The skill also utilizes xxd for hex-to-binary conversion of API responses, which is a powerful utility but appears to be used legitimately here.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the workflow will contact an external service and create local output files.
The skill instructs use of local command-line tools to call the SenseAudio API and write decoded audio locally; this is central to the TTS purpose and is disclosed.
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 ... jq -r '.data.audio' response.json | xxd -r -p > output.mp3
Review the text, voice ID, and output filenames before running the command, especially if the text is sensitive.
The skill can use the user's SenseAudio account quota or permissions to synthesize speech with a provided cloned voice.
The skill requires a SenseAudio API key and a cloned voice ID to access the TTS service; this is expected for the integration and is clearly documented.
requires: env: - SENSEAUDIO_API_KEY ... Authorization: Bearer $SENSEAUDIO_API_KEY ... "voice_id": "<CLONED_VOICE_ID>"
Use an appropriately scoped API key if available, keep it secret, and only use cloned voice IDs you are authorized to use.
Text submitted for synthesis and related pronunciation annotations are shared with the external TTS provider.
The API request sends the user's text, voice identifier, and pronunciation dictionary to the SenseAudio provider; this external data flow is necessary for the stated TTS function.
"text": "<TEXT>", ... "voice_id": "<CLONED_VOICE_ID>", ... "dictionary": <DICTIONARY_ARRAY>
Avoid submitting confidential or regulated text unless SenseAudio's handling and retention policies are acceptable for your use case.
