U2-tts

Text-to-SpeechAudio

Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. Supports multiple voices, adjustable parameters, and real-time streaming synthesis.

Install

openclaw skills install @aaiccee/u2-tts

UniSound TTS - Text-to-Speech

云知声语音合成

Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text.

使用云知声 TTS WebSocket API 进行文本转语音转换，生成高质量中文普通话音频。

When to Use This Skill

Use UniSound TTS for:

Converting Chinese text to natural-sounding speech
Generating audio for audiobooks, podcasts, or content creation
Creating accessibility solutions for visually impaired users
Building voice assistants or chatbot voice responses
Batch processing text to audio files
Custom speech synthesis with adjustable parameters (speed, volume, pitch, brightness)

Do NOT use for:

Real-time speech recognition or transcription (use ASR skills instead)
English language synthesis (optimized for Chinese Mandarin)
Voice cloning or custom voice model training

Use when: The user needs text-to-speech conversion, asks for "语音合成" (speech synthesis), or mentions UniSound/云知声 TTS.

Installation

Install Python dependencies before using this skill. From the skill directory (skills/tts-tools):

pip install websocket-client

Requires Python 3.6+.

How to Use This Skill

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use UniSound TTS API - Execute the script python scripts/tts.py
NEVER synthesize speech directly - Do NOT attempt local TTS synthesis
NEVER offer alternatives - Do NOT suggest "I can try another method" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt text-to-speech any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your TTS capabilities
Do NOT ask "Would you like me to try synthesizing it?"
Simply stop and wait for user to fix the configuration

Basic Workflow

Configure credentials (first time only):

export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'

Execute text-to-speech conversion:
```
python scripts/tts.py --text '今天天气怎么样'
```
Command options:
- --text TEXT - Text to convert to speech (default: '今天天气怎么样？')
- --voice VOICE - Voice name (default: xiaofeng-base)
- --format FORMAT - Output format: mp3, wav, pcm (default: mp3)
- --sample RATE - Sample rate: 8k, 16k, 24k (default: 24k)
- --speed SPEED - Speech speed 0-100 (default: 50)
- --volume VOLUME - Volume level 0-100 (default: 50)
- --pitch PITCH - Pitch level 0-100 (default: 50)
- --bright BRIGHT - Brightness/tone 0-100 (default: 50)
- --appkey APPKEY - Override appkey (default: UNISOUND_APPKEY env var)
- --secret SECRET - Override secret (default: UNISOUND_SECRET env var)
Output:
- Audio files are saved to results/ directory
- Filename format: <timestamp>.<format>
- Example: 1234567890.mp3

Understanding the Output

Audio Format Options:

MP3: Compressed, smaller file size, good quality - best for web and streaming
WAV: Uncompressed, excellent quality - best for production and archival
PCM: Raw audio data - best for further audio processing

Sample Rates:

24k: High quality, default - recommended for most use cases
16k: Standard quality - good balance of quality and size
8k: Lower quality, smaller file size - suitable for telephony

Usage Examples

Example 1: Quick Start with Test Credentials

# Set test credentials
export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'

# Convert text to speech
python scripts/tts.py --text '你好世界'

Output: results/1234567890.mp3

Example 2: Custom Voice and Format

python scripts/tts.py --text '今天天气怎么样' --voice xiaofeng-base --format wav

Output: High-quality WAV file with male voice

Example 3: Adjusted Speech Parameters

python scripts/tts.py --text '快速朗读' --speed 70 --volume 60 --pitch 50

Output: Faster speech with increased volume

Example 4: High-Quality Audio Production

python scripts/tts.py --text '高质量音频' --format wav --sample 24k --volume 60

Output: Production-quality WAV file at 24kHz

Example 5: Command-line Credential Override

python scripts/tts.py \
  --text '测试' \
  --appkey 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' \
  --secret '5c12231cd279b35873a3ccecf9439118'

How It Works

The script uses the UniSound TTS WebSocket API with the following workflow:

Authenticate using SHA256 signature (appkey + timestamp + secret) 使用 SHA256 签名进行身份验证
Establish WebSocket connection to wss://ws-stts.hivoice.cn/v1/tts 建立 WebSocket 连接到云知声 TTS 服务
Send TTS request with text and voice parameters 发送包含文本和语音参数的 TTS 请求
Receive streaming audio data in binary chunks 以二进制块形式接收流式音频数据
Save audio file to the results directory 将音频文件保存到结果目录

Available Voices

Voice	Type	Description
xiaofeng-base	Male	Standard male voice, clear and natural
xiaoyan	Female	Female voice options
xiaomei	Female	Alternative female voice
Custom voices	Various	Contact UniSound for more options

Adjustable Parameters

Parameter	Range	Default	Description
speed	0-100	50	Speech speed (50 = normal, higher = faster)
volume	0-100	50	Volume level (50 = normal, higher = louder)
pitch	0-100	50	Pitch level (50 = normal, higher = higher)
bright	0-100	50	Brightness/tone (50 = normal)

Recommended settings:

Audiobooks: speed 45, pitch 50
News/announcements: speed 55, volume 60, bright 60
Accessibility: speed 35-40, volume 70
Normal conversation: speed 50, all parameters 50

First-Time Configuration

When credentials are not configured:

The script will show:

Error: AppKey and Secret are required!
Set them via --appkey/--secret arguments or UNISOUND_APPKEY/UNISOUND_SECRET environment variables.

Test Credentials

For testing and evaluation, use these credentials:

用于测试和评估，请使用以下凭据：

export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'

⚠️ Important Security Notice / 重要安全提示

Test credentials only — These are for testing and evaluation purposes

仅测试凭据——这些凭据仅供测试和评估使用

No sensitive data — Never use with production or sensitive content

勿用于敏感数据——切勿用于生产或敏感内容

Get your own credentials — For production use, contact UniSound

获取自己的凭据——生产环境请联系云知声

Data privacy — Text is sent to UniSound servers for processing

数据隐私——文本将发送至云知声服务器进行处理

Obtaining Production Credentials

For production use, obtain API credentials from UniSound (云知声):

用于生产环境时，请从云知声获取 API 凭据：

Contact UniSound to obtain your API credentials 联系云知声获取您的 API 凭据 Visit: https://www.unisound.com/
You will receive: 您将收到：
- AppKey: Application key / 应用密钥
- Secret: Secret key for authentication / 认证密钥

Configuration Methods

Method 1: Environment Variables (Recommended)

Linux/macOS:

export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
python scripts/tts.py --text '你好'

Windows (PowerShell):

$env:UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
$env:UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
python scripts/tts.py --text '你好'

Windows (CMD):

set UNISOUND_APPKEY=ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3
set UNISOUND_SECRET=5c12231cd279b35873a3ccecf9439118
python scripts/tts.py --text '你好'

Method 2: .env File (Recommended for Development)

Create a .env file in the project root:

UNISOUND_APPKEY=ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3
UNISOUND_SECRET=5c12231cd279b35873a3ccecf9439118

Then use with python-dotenv or load in your shell.

Security Note: Never commit .env files or actual production credentials to version control. 安全提示：切勿将 .env 文件或实际生产凭据提交到版本控制系统。

Method 3: Command-Line Arguments

python scripts/tts.py \
  --text '你好世界' \
  --appkey 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' \
  --secret '5c12231cd279b35873a3ccecf9439118'

Required Environment Variables

Variable	Required	Description
`UNISOUND_APPKEY`	Yes	Application key / 应用密钥
`UNISOUND_SECRET`	Yes	Secret key / 认证密钥

Python API Usage

Basic Python API:

import os
from scripts.tts import Ws_parms, do_ws, write_results

# Get credentials from environment variables
appkey = os.getenv('UNISOUND_APPKEY', 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3')
secret = os.getenv('UNISOUND_SECRET', '5c12231cd279b35873a3ccecf9439118')

# Configure TTS parameters
ws_parms = Ws_parms(
    url='wss://ws-stts.hivoice.cn/v1/tts',
    appkey=appkey,
    secret=secret,
    pid=1,
    vcn='xiaofeng-base',
    text='你好，欢迎使用云知声语音合成服务！',
    tts_format='mp3',
    tts_sample='24k',
    user_id='my-app',
)

# Execute TTS conversion
do_ws(ws_parms)

# Save result to file
write_results(ws_parms)
print('Audio saved to results/ directory!')

Error Handling

Authentication failed:

Error: AppKey and Secret are required!

→ Credentials not provided → Set UNISOUND_APPKEY and UNISOUND_SECRET environment variables → 未提供凭据，请设置环境变量

WebSocket connection error:

WebSocket error: ...

→ Check network connectivity to UniSound API → Verify the API endpoint URL is correct → Check if firewall is blocking WebSocket connections → 检查网络连接和防火墙设置

No audio data received:

Error: No audio data received

→ Text may be empty or contain invalid characters → Check the text parameter is not empty → Verify text encoding is UTF-8 → Credentials may be invalid → 检查文本内容、编码和凭据

Invalid speech parameter:

Error: speed must be between 0 and 100, got 150

→ Speech parameters must be between 0 and 100 → 语音参数必须在 0 到 100 之间

WebSocket connection timeout:

WebSocket error: timeout

→ Network connection issue → API service may be temporarily unavailable → Check internet connection → 网络连接问题或服务暂时不可用

Advanced Usage

Custom Speech Parameters

import os
from scripts.tts import Ws_parms, do_ws, write_results

appkey = os.getenv('UNISOUND_APPKEY', 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3')
secret = os.getenv('UNISOUND_SECRET', '5c12231cd279b35873a3ccecf9439118')

ws_parms = Ws_parms(
    url='wss://ws-stts.hivoice.cn/v1/tts',
    appkey=appkey,
    secret=secret,
    pid=1,
    vcn='xiaofeng-base',
    text='这是自定义参数的语音合成示例',
    tts_format='wav',
    tts_sample='24k',
    user_id='demo',
)

# Customize speech parameters
ws_parms.tts_speed = 60   # Faster speech (0-100)
ws_parms.tts_volume = 70  # Louder volume (0-100)
ws_parms.tts_pitch = 40   # Lower pitch (0-100)
ws_parms.tts_bright = 60  # Brighter tone (0-100)

do_ws(ws_parms)
write_results(ws_parms)

Batch Text Processing

import os
from scripts.tts import Ws_parms, do_ws, write_results

def batch_tts(text_list):
    """Convert multiple texts to audio files"""
    appkey = os.getenv('UNISOUND_APPKEY', 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3')
    secret = os.getenv('UNISOUND_SECRET', '5c12231cd279b35873a3ccecf9439118')

    for i, text in enumerate(text_list):
        ws_parms = Ws_parms(
            url='wss://ws-stts.hivoice.cn/v1/tts',
            appkey=appkey,
            secret=secret,
            pid=i,
            vcn='xiaofeng-base',
            text=text,
            tts_format='mp3',
            tts_sample='24k',
            user_id=f'batch-{i}',
        )

        do_ws(ws_parms)
        write_results(ws_parms)
        print(f"Generated: {text[:30]}...")

# Usage
texts = [
    "第一段文字",
    "第二段文字",
    "第三段文字"
]
batch_tts(texts)

Audiobook Chapter Converter

import os
from scripts.tts import Ws_parms, do_ws, write_results

def convert_chapter(chapter_text, chapter_num, voice='xiaofeng-base'):
    """Convert a book chapter to audio file"""
    # Add chapter announcement
    intro = f"第{chapter_num}章。"
    full_text = intro + chapter_text

    appkey = os.getenv('UNISOUND_APPKEY', 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3')
    secret = os.getenv('UNISOUND_SECRET', '5c12231cd279b35873a3ccecf9439118')

    ws_parms = Ws_parms(
        url='wss://ws-stts.hivoice.cn/v1/tts',
        appkey=appkey,
        secret=secret,
        pid=chapter_num,
        vcn=voice,
        text=full_text,
        tts_format='mp3',
        tts_sample='24k',
        user_id=f'audiobook-ch{chapter_num}',
    )

    # Slower, clearer reading for books
    ws_parms.tts_speed = 45
    ws_parms.tts_pitch = 50

    do_ws(ws_parms)
    write_results(ws_parms)
    print(f"Chapter {chapter_num} converted")

# Usage
chapter = """这是第一章的内容。在一个阳光明媚的早晨，
主人公开始了他的冒险之旅。"""
convert_chapter(chapter, 1)

Accessibility Helper

import os
from scripts.tts import Ws_parms, do_ws, write_results

def accessibility_reader(text, speed='normal', voice='xiaofeng-base'):
    """
    Text-to-speech for accessibility (visually impaired users)
    with customizable reading speed
    """
    speed_map = {
        'slow': 35,
        'normal': 50,
        'fast': 65
    }

    appkey = os.getenv('UNISOUND_APPKEY', 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3')
    secret = os.getenv('UNISOUND_SECRET', '5c12231cd279b35873a3ccecf9439118')

    ws_parms = Ws_parms(
        url='wss://ws-stts.hivoice.cn/v1/tts',
        appkey=appkey,
        secret=secret,
        pid=1,
        vcn=voice,
        text=text,
        tts_format='mp3',
        tts_sample='24k',
        user_id='accessibility',
    )

    ws_parms.tts_speed = speed_map.get(speed, 50)
    ws_parms.tts_volume = 70  # Higher volume for accessibility

    do_ws(ws_parms)
    write_results(ws_parms)
    return ws_parms.tts_stream

# Usage
article = "这是一篇重要的新闻文章。"
accessibility_reader(article, speed='slow')

Important Notes

Chinese language optimized - Best results with Simplified Chinese text 中文优化——简体中文文本效果最佳
Requires stable internet connection for WebSocket streaming 需要稳定的网络连接进行 WebSocket 流式传输
Audio files saved locally - Check results/ directory for output 音频文件保存在本地——输出文件在 results/ 目录
Text encoding - Ensure text is UTF-8 encoded 文本编码——确保文本为 UTF-8 编码
Default sample rate is 24k - Higher quality than standard 16k 默认采样率为 24k——比标准 16k 质量更高
Test credentials - Provided for testing and evaluation only 测试凭据——提供的凭据仅供测试和评估使用

Security Best Practices

For testing - Use the provided test credentials 测试使用——使用提供的测试凭据
For production - Always obtain your own credentials from UniSound 生产环境——始终从云知声获取您自己的凭据
Use environment variables - Store credentials securely in environment variables 使用环境变量——安全地将凭据存储在环境变量中
Never hardcode credentials - Don't embed production credentials in code 切勿硬编码凭据——不要在代码中嵌入生产凭据
Use .env files - For local development (add to .gitignore) 使用 .env 文件——用于本地开发（添加到 .gitignore）
Rotate credentials regularly - In production environments 定期轮换凭据——在生产环境中

Troubleshooting

Issue: Script fails with import error → Ensure dependencies are installed: pip install websocket-client → Ensure using Python 3.6 or later → 确保安装依赖并使用 Python 3.6 或更高版本

Issue: "AppKey and Secret are required!" error → Set UNISOUND_APPKEY and UNISOUND_SECRET environment variables → Or use --appkey and --secret command-line arguments → 设置环境变量或使用命令行参数

Issue: Poor audio quality → Try using WAV format with 24k sample rate → Adjust speech parameters for your use case → 尝试使用 WAV 格式和 24k 采样率

Issue: WebSocket connection timeout → Check network connectivity → Verify firewall allows WebSocket connections → Check if API service is operational → 检查网络连接和防火墙设置

Issue: Generated audio sounds unnatural → Adjust speed parameter (try 45-55 range) → Check text for proper punctuation → Consider breaking long sentences into shorter ones → 调整语速参数和文本标点

Issue: Test credentials stopped working → Test credentials may have expiration or rate limits → Contact UniSound to obtain your own credentials → 测试凭据可能已过期或达到速率限制 → 请联系云知声获取您自己的凭据

Tips and Best Practices

For audiobooks: Use speed 45, add chapter announcements 有声读物：使用速度 45，添加章节说明
For accessibility: Use speed 35-40, higher volume (70) 无障碍应用：使用速度 35-40，更高音量（70）
For news: Use speed 55, brighter tone (60) 新闻播报：使用速度 55，更明亮的语调（60）
For batch processing: Implement delays between requests 批量处理：在请求之间实现延迟
For production: Add error handling and retry logic 生产环境：添加错误处理和重试逻辑
For best quality: Use 24k sample rate with WAV format 最佳质量：使用 24k 采样率和 WAV 格式

Reference Documentation

Load these reference documents when:

Debugging API connection issues
Understanding advanced features
Need detailed API parameter information

Authentication Details

The UniSound TTS API uses SHA256 signature-based authentication:

# Signature format (automatically generated by Ws_parms class)
# SHA256(appkey + timestamp + secret).upper()

# Manual signature example (if needed):
import hashlib
import time

def generate_signature(appkey, secret):
    timestamp = str(int(time.time() * 1000))
    hs = hashlib.sha256()
    hs.update((appkey + timestamp + secret).encode('utf-8'))
    signature = hs.hexdigest().upper()
    return timestamp, signature

WebSocket URL format:

wss://ws-stts.hivoice.cn/v1/tts?time={timestamp}&appkey={appkey}&sign={signature}

Note: API capabilities, available voices, and rate limits are determined by your UniSound TTS API service configuration and subscription plan. 注意：API 功能、可用语音和速率限制由您的云知声 TTS API 服务配置和订阅计划决定。