Polyphone TTS

PassAudited by VirusTotal on May 11, 2026.

Overview

Type: OpenClaw Skill Name: polyphone Version: 1.0.0 The skill provides a utility for correcting Chinese polyphone pronunciation using the SenseAudio API (api.senseaudio.cn). It is classified as suspicious due to a potential shell injection vulnerability in the SKILL.md file, where user-controlled text and dictionary arrays are inserted directly into a single-quoted curl command template. While the logic is aligned with the stated purpose and no malicious intent was found, the lack of input sanitization in the provided shell instructions poses a security risk. The skill also utilizes xxd for hex-to-binary conversion of API responses, which is a powerful utility but appears to be used legitimately here.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Running the workflow will contact an external service and create local output files.

Why it was flagged

The skill instructs use of local command-line tools to call the SenseAudio API and write decoded audio locally; this is central to the TTS purpose and is disclosed.

Skill content
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 ... jq -r '.data.audio' response.json | xxd -r -p > output.mp3
Recommendation

Review the text, voice ID, and output filenames before running the command, especially if the text is sensitive.

What this means

The skill can use the user's SenseAudio account quota or permissions to synthesize speech with a provided cloned voice.

Why it was flagged

The skill requires a SenseAudio API key and a cloned voice ID to access the TTS service; this is expected for the integration and is clearly documented.

Skill content
requires: env: - SENSEAUDIO_API_KEY ... Authorization: Bearer $SENSEAUDIO_API_KEY ... "voice_id": "<CLONED_VOICE_ID>"
Recommendation

Use an appropriately scoped API key if available, keep it secret, and only use cloned voice IDs you are authorized to use.

What this means

Text submitted for synthesis and related pronunciation annotations are shared with the external TTS provider.

Why it was flagged

The API request sends the user's text, voice identifier, and pronunciation dictionary to the SenseAudio provider; this external data flow is necessary for the stated TTS function.

Skill content
"text": "<TEXT>", ... "voice_id": "<CLONED_VOICE_ID>", ... "dictionary": <DICTIONARY_ARRAY>
Recommendation

Avoid submitting confidential or regulated text unless SenseAudio's handling and retention policies are acceptable for your use case.