coze-voice-gen

v0.1.0

Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout.

0· 323· 2 versions· 3 current· 4 all-time· Updated 18h ago· MIT-0

Install

openclaw skills install coze-voice-gen

Coze Voice Generation

Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) using coze-coding-dev-sdk.

Text-to-Speech (TTS)

Single Audio

npx ts-node {baseDir}/scripts/tts.ts --text "Hello, welcome to our service!"

With Different Voice

npx ts-node {baseDir}/scripts/tts.ts \
  --text "This is a male voice" \
  --speaker zh_male_m191_uranus_bigtts

Batch Generation

npx ts-node {baseDir}/scripts/tts.ts \
  --texts "Chapter 1: Introduction" "Chapter 2: Getting Started" "Chapter 3: Advanced Topics" \
  --speaker zh_female_xueayi_saturn_bigtts

With Custom Parameters

npx ts-node {baseDir}/scripts/tts.ts \
  --text "Fast and loud announcement!" \
  --speech-rate 30 \
  --loudness-rate 20 \
  --format mp3 \
  --sample-rate 48000

TTS Options

OptionDescription
--text <text>Single text to synthesize
--texts <texts...>Multiple texts for batch generation
--speaker <id>Voice ID (default: zh_female_xiaohe_uranus_bigtts)
--format <fmt>mp3, pcm, ogg_opus (default: mp3)
--sample-rate <hz>8000-48000 (default: 24000)
--speech-rate <n>-50 to 100 (default: 0)
--loudness-rate <n>-50 to 100 (default: 0)

TTS Output

The script outputs audio URLs directly to stdout:

[1/1] Hello, welcome to our service!
  https://example.com/generated-audio.mp3

Available Voices

General Purpose:

  • zh_female_xiaohe_uranus_bigtts - Xiaohe (default)
  • zh_female_vv_uranus_bigtts - Vivi (Chinese & English)
  • zh_male_m191_uranus_bigtts - Yunzhou (male)
  • zh_male_taocheng_uranus_bigtts - Xiaotian (male)

Audiobook:

  • zh_female_xueayi_saturn_bigtts - Children's audiobook

Video Dubbing:

  • zh_male_dayi_saturn_bigtts - Dayi (male)
  • zh_female_mizai_saturn_bigtts - Mizai (female)
  • zh_female_jitangnv_saturn_bigtts - Motivational female

Role Playing:

  • saturn_zh_female_keainvsheng_tob - Cute girl
  • saturn_zh_male_shuanglangshaonian_tob - Cheerful boy

Speech-to-Text (ASR)

From URL

npx ts-node {baseDir}/scripts/asr.ts --url "https://example.com/audio.mp3"

From Local File

npx ts-node {baseDir}/scripts/asr.ts --file ./recording.mp3

ASR Options

OptionDescription
--url <url>Audio file URL
--file <path>Local audio file path

ASR Output

Transcription is printed directly to stdout:

============================================================
TRANSCRIPTION
============================================================
Hello, this is the transcribed text from the audio file...
============================================================

Duration: 1m 30s
Segments: 5

ASR Requirements

  • Duration: ≤ 2 hours
  • File size: ≤ 100MB
  • Formats: WAV, MP3, OGG OPUS, M4A

Notes

  • Audio URLs have valid expiration - use directly when possible
  • Speech rate: negative = slower, positive = faster
  • Loudness rate: negative = quieter, positive = louder

Version tags

latestvk97adrc7bz1rm8j0g7a14hgckx847vzt

Runtime requirements

🎙️ Clawdis
Binsnpx