Install
openclaw skills install audio-cogAI audio generation and text-to-speech powered by CellCog. Voiceover, narration, voice cloning, avatar voices, sound effects, music, podcasts, dialogue. Three voice providers (OpenAI, ElevenLabs, MiniMax). Professional audio production from text prompts.
openclaw skills install audio-cogCreate professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices.
CellCog provides three voice providers, each with different strengths. Choose based on your needs:
| Scenario | Provider | Why |
|---|---|---|
| Standard narration/voiceover | OpenAI | Best voice style control, consistent quality |
| Emotional/dramatic delivery | ElevenLabs | Richest emotional range, supports emotion tags |
| Cloned voice (avatar) | MiniMax | Only provider with voice cloning support |
| Character voice with specific accent | ElevenLabs | 100+ diverse pre-made voices |
| Fine pitch/speed/volume control | MiniMax | Granular voice settings |
For your first CellCog task in a session, read the cellcog skill for the full SDK reference — file handling, chat modes, timeouts, and more.
OpenClaw (fire-and-forget):
result = client.create_chat(
prompt="[your task prompt]",
notify_session_key="agent:main:main",
task_label="my-task",
chat_mode="agent",
)
All agents except OpenClaw (blocks until done):
from cellcog import CellCogClient
client = CellCogClient(agent_provider="openclaw|cursor|claude-code|codex|...")
result = client.create_chat(
prompt="[your task prompt]",
task_label="my-task",
chat_mode="agent",
)
print(result["message"])
Best for standard narration, voiceovers, and single-speaker content with precise delivery control.
Key strength: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want.
8 built-in voices:
| Voice | Gender | Characteristics |
|---|---|---|
| cedar | Male | Warm, resonant, authoritative, trustworthy |
| marin | Female | Bright, articulate, emotionally agile, professional |
| ballad | Male | Smooth, melodic, musical quality |
| coral | Female | Vibrant, lively, dynamic, spirited |
| echo | Male | Calm, measured, thoughtful, deliberate |
| sage | Female | Wise, contemplative, reflective |
| shimmer | Female | Soft, gentle, soothing, approachable |
| verse | Male | Poetic, rhythmic, artistic, expressive |
Best quality: cedar (male), marin (female).
Style customization examples:
Best for emotional delivery, dramatic content, character voices, and audiobook narration.
Key strength: Emotion tags embedded directly in text — [laughs], [sighs], [whispers], [excited], [sarcastic]. Plus 100+ diverse pre-made voices.
Emotion tags (use sparingly — 1-2 per paragraph):
| Tag | Effect |
|---|---|
[laughs] | Natural laughter |
[chuckles] | Soft/brief laughter |
[sighs] | Sighing |
[gasps] | Surprise/shock |
[whispers] | Whispering delivery |
[pause] | Natural pause/beat |
[sad], [happy], [excited], [angry], [sarcastic] | Emotional delivery |
Example prompt:
"Generate speech using ElevenLabs with a warm British male voice: 'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'"
Best for cloned voices (avatars) and fine-grained voice control.
Key strength: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control.
Standard voices include: Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Wise_Woman, Friendly_Person, Young_Knight, Elegant_Man, and more.
Voice settings: emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12).
Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person.
How it works:
Example prompt:
"Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'"
This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice.
CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds.
Example prompts:
Tips for better SFX:
Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free.
Capabilities:
Example prompts:
For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure.
All generated music is royalty-free — use commercially without attribution or licensing fees.
All three voice providers support 40+ languages. Provide speech text in the target language:
English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more.
Use chat_mode="agent" for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team.
Run /cellcog-setup (or /cellcog:cellcog-setup depending on your tool) to install and authenticate.
OpenClaw users: Run clawhub install cellcog instead.
Manual setup: pip install -U cellcog and set CELLCOG_API_KEY. See the cellcog skill for SDK reference.