Polyphone TTS

Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any request to correct TTS pronunciation.

Audits

Pass

Install

openclaw skills install polyphone

SenseAudio Polyphone TTS (多音字)

Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.

The dictionary parameter only works with cloned voices and model SenseAudio-TTS-1.5. System voices (male_0004_a etc.) do not support it.

Step 1: Scan for Polyphones

When the user provides text, scan it for these common polyphones and flag any that appear:

CharacterReadingsContext clues
háng (行业/银行/行列) / xíng (行走/行动/可行)银行、行长、行业 → háng
gān (干净/干燥) / gàn (干活/干部)干部、干活 → gàn
liáng (量体温/测量) / liàng (数量/重量)数量、质量 → liàng
pū (铺床/铺路) / pù (店铺/铺子)店铺、铺面 → pù
hǎo (好的/很好) / hào (好奇/爱好)爱好、好学 → hào
le (吃了/来了) / liǎo (了解/了结)了解、了不起 → liǎo
de (跑得快) / dé (得到) / děi (得去)得到 → dé;必须 → děi
de (慢慢地) / dì (土地/地方)副词用法 → de
de (我的) / dí (的确) / dì (目的)目的、的确 → dì/dí
zhe (看着) / zháo (着火) / zhuó (着装)着火、着急 → zháo;着装 → zhuó
cháng (长度/很长) / zhǎng (成长/行长)行长、生长 → zhǎng
zhòng (重量/重要) / chóng (重复/重新)重复、重新 → chóng
zhōng (中间/中国) / zhòng (中奖/中毒)中奖、中毒 → zhòng
hái (还有/还是) / huán (还钱/归还)还钱、偿还 → huán
fā (发现/发展) / fà (头发/理发)头发、理发 → fà
shù (数字/数量) / shǔ (数数/数一数)数数、数落 → shǔ
cān (参加/参考) / shēn (人参/党参)人参、党参 → shēn
chā (差别/差距) / chà (差不多) / chāi (出差)出差 → chāi;差不多 → chà

Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.

Example:

检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?

Step 2: Build the Dictionary

Convert confirmed readings into the dictionary array. Each entry covers one phrase containing the polyphone:

原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样

Pinyin format: [声母韵母声调数字] — e.g., [hang2][xing2][zhang3]

Example:

  • original: 银行行长
  • replacement: 银[hang2]行[zhang3]长

Build the full dictionary array:

"dictionary": [
  {"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
  {"original": "好奇心", "replacement": "[hao4]奇心"}
]

Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.

Step 3: Synthesize

The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner skill first.

curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "SenseAudio-TTS-1.5",
    "text": "<TEXT>",
    "stream": false,
    "voice_setting": {
      "voice_id": "<CLONED_VOICE_ID>"
    },
    "audio_setting": {
      "format": "mp3"
    },
    "dictionary": <DICTIONARY_ARRAY>
  }' -o response.json

jq -r '.data.audio' response.json | xxd -r -p > output.mp3

Check base_resp.status_code == 0 before decoding.

Step 4: Iterate

After the user listens, they may find additional mispronunciations. Update the dictionary array and re-synthesize. Keep the previous response.json until the new one succeeds.

Report: file path, duration (jq '.extra_info.audio_length' response.json ms), character count, and which dictionary entries were applied.