Install
openclaw skills install baidu-speech-synthesisBaidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogue audio generation, SSML/segment-merge dual modes, speech rate/pitch adjustment.
openclaw skills install baidu-speech-synthesisUse this skill when the user mentions:
Chinese triggers (for Chinese users):
This skill calls the Baidu Intelligent Cloud Speech Synthesis API, supporting multi-speaker dialogue synthesis (SSML mode or segment-merge fallback). It provides rich voice selection, speech rate/pitch/volume adjustment, and can automatically convert text dialogues into audio files with character-specific voices.
# Install Python dependencies
pip install requests
# Ensure ffmpeg is installed (required for audio merging)
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows: Download from https://ffmpeg.org/download.html
# Optional: If pydub is needed (alternative merging solution)
# pip install pydub
Choose one of three authentication methods:
export BAIDU_API_KEY="Your API Key (non-bce-v3 format)"
export BAIDU_SECRET_KEY="Your Secret Key"
1.)export BAIDU_API_KEY="YOUR_ACCESS_TOKEN"
# BAIDU_SECRET_KEY not required
bce-v3/)export BAIDU_API_KEY="YOUR_IAM_KEY_HERE"
# BAIDU_SECRET_KEY not required
# Note: Existing bce-v3/ALTAK-... keys may be dedicated to other services (e.g., search).
# If authentication fails, create a dedicated speech synthesis application to get API Key + Secret Key.
BAIDU_API_KEY must be set. Whether BAIDU_SECRET_KEY is needed depends on the authentication method:
BAIDU_API_KEY=Your API Key (non-bce-v3 format)
BAIDU_SECRET_KEY=Your Secret Key
1.)BAIDU_API_KEY=YOUR_ACCESS_TOKEN
# BAIDU_SECRET_KEY not required
bce-v3/)BAIDU_API_KEY=YOUR_IAM_KEY_HERE
# BAIDU_SECRET_KEY not required
The skill scripts automatically detect the key format and choose the corresponding authentication method. If not set, the user will be prompted.
# Single dialogue file synthesis
python ~/.openclaw/skills/baidu-speech-synthesis/scripts/baidu_tts.py \
--input dialogue.txt \
--output conversation.mp3
# Specify voice mapping (character name → voice code)
python scripts/baidu_tts.py \
--input script.txt \
--map 小明:1 小红:0 老师:106
# Batch process all .txt files in a directory
python scripts/baidu_tts.py \
--dir ./dialogues \
--format mp3
# Adjust parameters
python scripts/baidu_tts.py \
--input text.txt \
--spd 7 --pit 6 --vol 5 \
--aue 3
When the user triggers the above phrases, the skill will:
baidu-speech-synthesis/
├── SKILL.md # This file
├── scripts/
│ ├── baidu_tts.py # Main API client (token acquisition, SSML requests, segment merging)
│ ├── dialogue_formatter.py # Dialogue text → SSML conversion and voice mapping
│ └── audio_merger.py # ffmpeg audio merging tool (segment merge solution)
└── references/
├── voice_list.md # Voice code table, samples, recommended pairings
├── ssml_guide.md # Baidu SSML tags, limitations, examples
└── api_setup.md # How to obtain keys, free quota (5 million chars/month), authentication details
tex_type=3) for complex speech expressions of individual characters.--map.ffmpeg installation (skill will detect and prompt). No need to install pydub..env files.references/api_setup.md for help.except: statements in audio_merger.py; replaced with proper exception handling to improve debugging and error visibility.