audio-cog

AI audio generation powered by CellCog. Three voice providers (OpenAI, ElevenLabs, MiniMax), avatar cloned voices, sound effects, music generation up to 10 m...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 5 · 3.8k · 27 current installs · 29 all-time installs

byCellCog@nitishgargiitd

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name, description, and runtime instructions align: this is an instruction-only audio generation skill that delegates SDK/API work to a separate 'cellcog' skill. The claimed features (multiple voice providers, SFX, music, avatar cloning) are consistent with the text.

ℹ

Instruction Scope

SKILL.md stays on-topic (generate speech, music, SFX, cloned avatars). It tells the user to install and read the 'cellcog' skill for SDK setup. It references using ffmpeg to extend audio loops (an external binary not declared) and mentions avatar uploads — which implies file handling/uploads but does not detail how or to which endpoints files are sent.

✓

Install Mechanism

No install spec and no code files (instruction-only) — minimal disk footprint. It asks users to install the 'cellcog' skill via clawhub, which is reasonable, but the upstream 'cellcog' skill will determine what actually gets installed or run.

Credentials

This skill lists multiple providers (OpenAI, ElevenLabs, MiniMax) and avatar cloning but declares no required env vars or primary credential. That omission could be benign if the separate 'cellcog' skill centralizes credentials, but it's an unresolved gap: you should verify where API keys are provided, how they're stored, and which service accounts are used. Also check consent/process for uploading voice samples for cloning.

✓

Persistence & Privilege

No elevated privileges requested: always:false, user-invocable, and no install artifact present. The skill does not request persistent system-wide changes in its own instructions.

What to consider before installing

Before installing: 1) Review the 'cellcog' skill SKILL.md and source so you understand where API keys are supplied and how they're stored (this skill defers SDK setup to that skill). 2) Confirm whether OpenAI/ElevenLabs/MiniMax keys are required and which account CellCog will use — do not provide unrelated API keys to this skill without knowing why. 3) If you plan to use avatar/cloned voices, verify the consent and retention policy for uploaded voice samples (who can access them and how long they are kept). 4) Expect to need external tools (ffmpeg) for some workflows — ensure you have them available and trust any commands you run. 5) If you need stronger assurance, ask the publisher for the 'cellcog' skill source or a clear explanation of credential handling and the network endpoints audio is sent to. Install only after those questions are answered.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.4

Download zip

latestvk979xvwn871qdtwb0k0eqmt6bd83784t

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎵 Clawdis

SKILL.md

Audio Cog - AI Audio Generation Powered by CellCog

Create professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices.

CellCog provides three voice providers, each with different strengths. Choose based on your needs:

Scenario	Provider	Why
Standard narration/voiceover	OpenAI	Best voice style control, consistent quality
Emotional/dramatic delivery	ElevenLabs	Richest emotional range, supports emotion tags
Cloned voice (avatar)	MiniMax	Only provider with voice cloning support
Character voice with specific accent	ElevenLabs	100+ diverse pre-made voices
Fine pitch/speed/volume control	MiniMax	Granular voice settings

Prerequisites

This skill requires the cellcog skill for SDK setup and API calls.

clawhub install cellcog

Read the cellcog skill first for SDK setup. This skill shows you what's possible.

Voice Providers

OpenAI (Default)

Best for standard narration, voiceovers, and single-speaker content with precise delivery control.

Key strength: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want.

8 built-in voices:

Voice	Gender	Characteristics
cedar	Male	Warm, resonant, authoritative, trustworthy
marin	Female	Bright, articulate, emotionally agile, professional
ballad	Male	Smooth, melodic, musical quality
coral	Female	Vibrant, lively, dynamic, spirited
echo	Male	Calm, measured, thoughtful, deliberate
sage	Female	Wise, contemplative, reflective
shimmer	Female	Soft, gentle, soothing, approachable
verse	Male	Poetic, rhythmic, artistic, expressive

Best quality: cedar (male), marin (female).

Style customization examples:

"Warm conversational tone, medium pace, slight enthusiasm when mentioning features. American accent."
"Deep, hushed, enigmatic, with a slow deliberate cadence — true crime narrator style."
"Heavy French accent, sophisticated yet friendly, moderate pacing with deliberate pauses."

ElevenLabs

Best for emotional delivery, dramatic content, character voices, and audiobook narration.

Key strength: Emotion tags embedded directly in text — [laughs], [sighs], [whispers], [excited], [sarcastic]. Plus 100+ diverse pre-made voices.

Emotion tags (use sparingly — 1-2 per paragraph):

Tag	Effect
`[laughs]`	Natural laughter
`[chuckles]`	Soft/brief laughter
`[sighs]`	Sighing
`[gasps]`	Surprise/shock
`[whispers]`	Whispering delivery
`[pause]`	Natural pause/beat
`[sad]`, `[happy]`, `[excited]`, `[angry]`, `[sarcastic]`	Emotional delivery

Example prompt:

"Generate speech using ElevenLabs with a warm British male voice: 'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'"

MiniMax

Best for cloned voices (avatars) and fine-grained voice control.

Key strength: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control.

Standard voices include: Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Wise_Woman, Friendly_Person, Young_Knight, Elegant_Man, and more.

Voice settings: emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12).

Avatar / Cloned Voice

Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person.

How it works:

The user creates an avatar on cellcog.ai and uploads voice samples
CellCog clones their voice using MiniMax Speech 2.8 HD
Any audio request referencing that avatar uses their cloned voice

Example prompt:

"Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'"

This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice.

Sound Effects (SFX)

CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds.

Example prompts:

"Generate a sound effect of heavy rain hitting a metal roof with occasional thunder, 10 seconds"
"Create a crispy footsteps-on-fresh-snow sound effect, 5 seconds"
"Generate an echoing door slam in a large empty warehouse"

Tips for better SFX:

Be specific about textures and environment
Specify duration when exact length matters
For ambient audio longer than 30 seconds, generate a short loopable segment and extend with ffmpeg

Music Generation

Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free.

Capabilities:

Any genre or genre fusion
Instrumental and vocal tracks (specify if you want vocals)
Complex arrangements, mood transitions, and energy dynamics
Describe what you want — the model handles music theory

Example prompts:

"Create 2 minutes of calm lo-fi hip-hop background music with soft piano and mellow beats, 75 BPM"
"Generate a 15-second upbeat tech podcast intro jingle"
"Create 90 seconds of cinematic orchestral music — start soft and inspiring, build to a confident crescendo"
"Generate a 3-minute pop song about summer adventures with female vocals"

For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure.

All generated music is royalty-free — use commercially without attribution or licensing fees.

Multi-Language Support

All three voice providers support 40+ languages. Provide speech text in the target language:

English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more.

Chat Mode

Use chat_mode="agent" for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team.

Tips for Better Audio

Choose the right provider: OpenAI for standard narration, ElevenLabs for emotional/dramatic, MiniMax for cloned voices
Provide the complete script: Write out exactly what should be spoken — don't say "something about our product"
Include style instructions: "Confident but warm", "slow and deliberate", "with slight excitement"
For music: Specify duration, mood, genre, and tempo (BPM if you know it)
Pronunciation guidance: For names or technical terms, add hints: "CellCog (pronounced SELL-kog)"
For ElevenLabs emotion tags: Use sparingly — 1-2 per paragraph. Tags affect all subsequent text until a new tag.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…