Install
openclaw skills install voice-cloneGuide users through SenseAudio platform voice cloning, then generate TTS with cloned `voice_id` values. Use when users want to clone voices, manage cloned voice slots, or synthesize audio with a cloned voice.
openclaw skills install voice-cloneGuide users through platform-side voice cloning, then generate personalized TTS with the resulting cloned voice_id.
voice_id valuesSENSEAUDIO_API_KEY.Authorization: Bearer <API_KEY>.python3, requests, and pydub.pydub is only needed for optional local audio validation.Use the official SenseAudio platform voice-cloning rules summarized below:
voice_id for API use.3-30 seconds<=50MBMP3, WAV, or AACUse the official TTS API on /v1/t2a_v2 after the user already has a cloned voice_id:
SenseAudio-TTS-1.0voice_setting.voice_id is required and may be a cloned voice IDmp3, wav, pcm, flac8000, 16000, 22050, 24000, 32000, 4410032000, 64000, 128000, 2560001 or 2dictionary is only for cloned voices and requires model=SenseAudio-TTS-1.5voice_id.SenseAudio-TTS-1.0 for normal synthesis.SenseAudio-TTS-1.5 only when a pronunciation dictionary is needed.def guide_voice_cloning():
return """
To clone a voice on the SenseAudio platform:
1. Open https://senseaudio.cn/platform/voice-clone
2. Prepare a clean speech sample:
- Duration: 3-30 seconds
- Format: MP3 / WAV / AAC
- Size: 50MB or less
- Environment: quiet, low echo, clear speech
3. Upload or record the sample on the platform
4. Wait for the platform to finish training
5. Copy the resulting voice_id from the voice list
6. Use that voice_id in later TTS API calls
"""
import binascii
import os
import requests
API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"
def generate_with_cloned_voice(text, voice_id, speed=1.0, vol=1.0, pitch=0):
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "SenseAudio-TTS-1.0",
"text": text,
"stream": False,
"voice_setting": {
"voice_id": voice_id,
"speed": speed,
"vol": vol,
"pitch": pitch,
},
"audio_setting": {
"format": "mp3",
"sample_rate": 32000,
"bitrate": 128000,
"channel": 2,
},
},
timeout=60,
)
response.raise_for_status()
data = response.json()
return binascii.unhexlify(data["data"]["audio"]), data.get("trace_id")
Use this only for cloned voices that need explicit polyphone correction.
def generate_with_dictionary(text, voice_id, dictionary):
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "SenseAudio-TTS-1.5",
"text": text,
"voice_setting": {"voice_id": voice_id},
"dictionary": dictionary,
},
timeout=60,
)
response.raise_for_status()
return response.json()
Dictionary items follow the official shape:
original: source text spanreplacement: pronunciation override such as [hao4]干净from pydub import AudioSegment
def validate_cloning_audio(audio_file):
audio = AudioSegment.from_file(audio_file)
issues = []
if not 3000 <= len(audio) <= 30000:
issues.append("duration_out_of_range")
if audio.frame_rate < 16000:
issues.append("sample_rate_low")
if audio.channels > 2:
issues.append("too_many_channels")
if not audio_file.lower().endswith((".mp3", ".wav", ".aac")):
issues.append("unsupported_extension")
return {
"valid": not issues,
"issues": issues,
"duration_ms": len(audio),
"sample_rate": audio.frame_rate,
"channels": audio.channels,
}
voice_id labels and local descriptionsAPI_KEY and SENSEAUDIO_API_KEY; use SENSEAUDIO_API_KEY consistently.SenseAudio-TTS-1.0 by default; reserve SenseAudio-TTS-1.5 for cloned-voice dictionary use.voice_id values as user-specific operational identifiers.