Yummy Gen Voice

Other

Use when the user wants to synthesise speech or text-to-speech (TTS) audio with Gemini through yummycli, including single-speaker narration, multi-speaker dialogue (up to 2 speakers), and listing available voices.

Install

openclaw skills install @yummysource/yummy-gen-voice

Synthesise Speech

Generate spoken audio with yummycli gemini speak using Google Gemini TTS.

When to Use

Load this skill when the user asks to synthesise speech, convert text to audio, generate a voiceover, create a narration, or produce a spoken dialogue — including single-speaker TTS and multi-speaker conversation.

Prerequisite: Apply the yummy-shared skill first.

This skill covers three modes with a single command:

Single-speaker narration (one voice, any language)
Multi-speaker dialogue (up to 2 speakers, each with their own voice)
Listing available prebuilt voices

Command Contract

Two equivalent entry points are available:

Entry point	When to use
`yummycli gemini speak`	Default — human-friendly, Gemini TTS presets applied
`yummycli audio speak --provider gemini`	Scripting / automation — explicit, provider-agnostic form

Both share the same flags and defaults. Prefer gemini speak unless the task explicitly requires the provider-agnostic form.

Basic usage:

yummycli gemini speak --text "<text>"

With an explicit voice and output path:

yummycli gemini speak \
  --text "<text>" \
  --voice Kore \
  --output narration.wav

Optional controls:

--output <file.wav>
--model <model>
--voice <voice-name>
--language <bcp47-code>

Default values when omitted: --model gemini-3.1-flash-tts-preview, --voice Aoede.

Speaker Routing Rules

The presence of --speaker flags determines the synthesis path automatically:

Input	Behaviour
No `--speaker`	Single-speaker synthesis. `--voice` selects the prebuilt voice.
1–2 `--speaker` flags	Multi-speaker dialogue. Each flag maps a speaker name to a voice. `--voice` must not be used together.

--voice and --speaker are mutually exclusive. Never pass both.

Model Selection

Default model: gemini-3.1-flash-tts-preview.

User says	Use
`3.1`, `3.1 flash`, or no preference	`gemini-3.1-flash-tts-preview` (default)
`2.5 flash` or `flash 2.5`	`gemini-2.5-flash-preview-tts`
`2.5 pro` or `pro 2.5`	`gemini-2.5-pro-preview-tts`

Do not switch models from vague quality words alone.

Available Voices

30 prebuilt voices are available. Run yummycli gemini voices to list them all.

Voice	Style
Aoede	Breezy
Kore	Firm
Charon	Informative
Puck	Upbeat
Fenrir	Excitable
Zephyr	Bright
Leda	Youthful
Orus	Firm
Callirrhoe	Easy-going
Autonoe	Bright
Enceladus	Breathy
Iapetus	Clear
Umbriel	Easy-going
Algieba	Smooth
Despina	Smooth
Erinome	Clear
Algenib	Gravelly
Rasalghul	Informative
Achird	Friendly
Zubenelgenubi	Casual
Vindemiatrix	Gentle
Sadachbia	Lively
Sadaltager	Knowledgeable
Sulafat	Warm
Schedar	Even
Gacrux	Mature
Pulcherrima	Forward
Laomedeia	Upbeat
Achernar	Soft
Alnilam	Firm

When the user does not specify a voice, use the default (Aoede). Only apply a different voice when the user explicitly names one or describes a style that clearly maps to a specific voice.

Language

Language is auto-detected from the input text when --language is omitted.
Pass --language only when the user explicitly specifies a language or when the text could be ambiguous (e.g. romanised transliteration).
Use BCP-47 codes: en-US, zh-CN, ja-JP, ko-KR, fr-FR, etc.

Intent to Parameters

Voice guidance:

For neutral or general-purpose narration, use the default Aoede (Breezy).
For formal or instructional content, consider Charon (Informative) or Kore (Firm).
For energetic or promotional content, consider Puck (Upbeat) or Fenrir (Excitable).
For warm conversational content, consider Sulafat (Warm) or Achird (Friendly).
Only switch from the default when the user's intent clearly maps to a specific style.

Output path guidance:

If --output is omitted, yummycli generates a timestamped .wav filename in the current working directory. Do not invent your own filename unless the user provides one.
The output path must end in .wav. Reject or correct any other extension.

Multi-speaker prompt format:

Each speaker's lines must be tagged with their name in square brackets: [Alice]: Hello! [Bob]: Hi there!
Speaker names in --speaker flags must exactly match the names used in --text.

Output Contract

Speak commands return JSON on stdout. Read the response and use the output field as the generated file path.

Single-speaker example:

{
  "provider": "gemini",
  "output": "tts_20260420_142301_047.wav",
  "model": "gemini-3.1-flash-tts-preview",
  "voice": "Aoede",
  "elapsed_seconds": 3
}

Multi-speaker example:

{
  "provider": "gemini",
  "output": "dialogue_20260420_143010_112.wav",
  "model": "gemini-3.1-flash-tts-preview",
  "speakers": [
    {"name": "Alice", "voice": "Aoede"},
    {"name": "Bob", "voice": "Kore"}
  ],
  "elapsed_seconds": 4
}

Execution Rules

Check yummycli auth status --provider gemini before running if credentials may not be configured.
Never pass --voice and --speaker together.
Never pass more than 2 --speaker flags — the API rejects it.
Speaker names in --speaker flags must match the names used in the --text prompt exactly.
If the command returns a validation error, fix the arguments before retrying. Do not retry with the same invalid arguments.
Report the final output path back to the user after a successful run.

Examples

Single-speaker narration:

yummycli gemini speak \
  --text "A golden retriever is the best dog in the world."

Narration with an explicit voice:

yummycli gemini speak \
  --text "Welcome to the future of AI-powered audio." \
  --voice Puck \
  --output welcome.wav

Chinese narration (auto-detected language):

yummycli gemini speak \
  --text "你好，欢迎使用 Gemini 语音合成服务。" \
  --voice Aoede \
  --output greeting.wav

Multi-speaker dialogue:

yummycli gemini speak \
  --text "[Alice]: 你好！今天天气真好。 [Bob]: 是啊，我们去散步吧！" \
  --speaker Alice:Aoede \
  --speaker Bob:Kore \
  --output dialogue.wav

List all available voices:

yummycli gemini voices