Install
openclaw skills install polyphoneFix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any request to correct TTS pronunciation.
openclaw skills install polyphonePrecise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.
The
dictionaryparameter only works with cloned voices and modelSenseAudio-TTS-1.5. System voices (male_0004_a etc.) do not support it.
When the user provides text, scan it for these common polyphones and flag any that appear:
| Character | Readings | Context clues |
|---|---|---|
| 行 | háng (行业/银行/行列) / xíng (行走/行动/可行) | 银行、行长、行业 → háng |
| 干 | gān (干净/干燥) / gàn (干活/干部) | 干部、干活 → gàn |
| 量 | liáng (量体温/测量) / liàng (数量/重量) | 数量、质量 → liàng |
| 铺 | pū (铺床/铺路) / pù (店铺/铺子) | 店铺、铺面 → pù |
| 好 | hǎo (好的/很好) / hào (好奇/爱好) | 爱好、好学 → hào |
| 了 | le (吃了/来了) / liǎo (了解/了结) | 了解、了不起 → liǎo |
| 得 | de (跑得快) / dé (得到) / děi (得去) | 得到 → dé;必须 → děi |
| 地 | de (慢慢地) / dì (土地/地方) | 副词用法 → de |
| 的 | de (我的) / dí (的确) / dì (目的) | 目的、的确 → dì/dí |
| 着 | zhe (看着) / zháo (着火) / zhuó (着装) | 着火、着急 → zháo;着装 → zhuó |
| 长 | cháng (长度/很长) / zhǎng (成长/行长) | 行长、生长 → zhǎng |
| 重 | zhòng (重量/重要) / chóng (重复/重新) | 重复、重新 → chóng |
| 中 | zhōng (中间/中国) / zhòng (中奖/中毒) | 中奖、中毒 → zhòng |
| 还 | hái (还有/还是) / huán (还钱/归还) | 还钱、偿还 → huán |
| 发 | fā (发现/发展) / fà (头发/理发) | 头发、理发 → fà |
| 数 | shù (数字/数量) / shǔ (数数/数一数) | 数数、数落 → shǔ |
| 参 | cān (参加/参考) / shēn (人参/党参) | 人参、党参 → shēn |
| 差 | chā (差别/差距) / chà (差不多) / chāi (出差) | 出差 → chāi;差不多 → chà |
Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.
Example:
检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?
Convert confirmed readings into the dictionary array. Each entry covers one phrase containing the polyphone:
原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样
Pinyin format: [声母韵母声调数字] — e.g., [hang2]、[xing2]、[zhang3]
Example:
银行行长银[hang2]行[zhang3]长Build the full dictionary array:
"dictionary": [
{"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
{"original": "好奇心", "replacement": "[hao4]奇心"}
]
Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.
The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner skill first.
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
-H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "SenseAudio-TTS-1.5",
"text": "<TEXT>",
"stream": false,
"voice_setting": {
"voice_id": "<CLONED_VOICE_ID>"
},
"audio_setting": {
"format": "mp3"
},
"dictionary": <DICTIONARY_ARRAY>
}' -o response.json
jq -r '.data.audio' response.json | xxd -r -p > output.mp3
Check base_resp.status_code == 0 before decoding.
After the user listens, they may find additional mispronunciations. Update the dictionary array and re-synthesize. Keep the previous response.json until the new one succeeds.
Report: file path, duration (jq '.extra_info.audio_length' response.json ms), character count, and which dictionary entries were applied.