Install
openclaw skills install ai-music-runcomfyGenerate AI music on RunComfy via the `runcomfy` CLI β a smart router across the music-model catalog. Routes to ElevenLabs AI Music Generation (premium 44.1 kHz stereo vocal tracks, 5 sβ5 min, $0.0083/s) and ACE Step / ACE Step 1.5 (StepFun-AI open-weights, tag-driven composition, multilingual lyrics, $0.0002β0.0003/s, ~27Γ cheaper), plus ACE Step audio-inpaint (regenerate a time range inside an existing track) and ACE Step audio-outpaint (extend a track before or after). Picks the right model for the user's actual intent β premium vocal hook, cheap background music library, multilingual pop song, repair a bad chorus, lengthen a 30 s draft into a 2 min cut β and ships each model's documented prompting patterns plus the minimal `runcomfy run` invoke. Triggers on "generate music", "make a song", "AI music", "background music", "instrumental track", "soundtrack", "jingle", "theme music", "royalty-free music", "compose", "music with lyrics", "extend music", "fix this song", "inpaint music", or any explicit ask to generate or edit music.
openclaw skills install ai-music-runcomfyGenerate AI music on RunComfy through one CLI β vocal songs, instrumentals, jingles, game loops, multilingual covers. This skill picks the right model from the RunComfy catalog based on the user's actual intent and ships the documented prompting patterns + the exact runcomfy run invoke for each.
runcomfy.com Β· Audio models Β· CLI docs
# 1. Install (one of β see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # global install
npx -y @runcomfy/cli --version # zero-install
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Generate music
runcomfy run <vendor>/<model>/<endpoint> \
--input '{"prompt": "...", ...}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
ACE Step 1.5 β acestep-ai/ace-step-1.5/text-to-audio
Latest ACE Step generation. 50+ language vocal support, refined structured-lyric handling, $0.0003/s. Open-weights (Apache 2.0). Pick for: multilingual launches, vocal songs in non-English, hero-quality ACE output. Avoid for: maximally polished commercial vocal hooks (try ElevenLabs Music) or cost-sensitive batches (try base ACE Step).
ElevenLabs AI Music Generation β elevenlabs/elevenlabs/music-generation
Premium 44.1 kHz stereo, 5 sβ5 min, section-level control (Intro/Verse/Chorus/Bridge), multilingual vocals, commercial-friendly. $0.0083/s (~27Γ ACE Step). Pick for: hero brand campaigns, polished vocal hooks, premium commercial cuts, ad music. Avoid for: high-volume drafts / background music libraries β cost dominates.
ACE Step (base) β acestep-ai/ace-step/text-to-audio (default for cost-sensitive work)
Original ACE Step. Tag-driven composition, optional lyrics, 5β240 s stereo. $0.0002/s β cheapest CLI-reachable music model on RunComfy. Pick for: background music libraries, jingles, game loops, drafts, cost-sensitive iteration. Avoid for: premium vocal hooks β use ElevenLabs Music or ACE Step 1.5.
ACE Step audio-inpaint β acestep-ai/ace-step/audio-inpaint
Regenerate a time range (start_time / end_time, anchorable to track start or end) inside an existing track. Pick for: fix a bad chorus, swap the bridge, replace a 20 s section without re-rendering. Avoid for: edits not bounded by time (use the source-model text-to-music instead).
ACE Step audio-outpaint β acestep-ai/ace-step/audio-outpaint
Extend an existing track bidirectionally β add intro before, outro after, or both (
extend_before_duration/extend_after_duration). Pick for: lengthen a 30 s hook into a 2 min cut, add a fade-out, build longer arrangement around an existing hook. Avoid for: extending past 4 min total β chain calls instead.
The agent reads these tables, classifies user intent (premium vs cost-sensitive Β· multilingual Β· vocal vs instrumental Β· generate vs edit), and picks the matching subsection below.
Model: elevenlabs/elevenlabs/music-generation
Full schema + tips: see the dedicated elevenlabs-music-generation skill.
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{
"prompt": "Upbeat indie-pop anthem, bright electric guitars, driving drums, 120 BPM, female lead vocal. [Intro 8 bars] instrumental build. [Verse] Chalk on the palms, laces double-knotted. [Chorus] We rise, we strike, we never fade out. [Outro] full band, fade.",
"music_length_ms": 60000
}' \
--output-dir ./out
ElevenLabs Music reads one prompt carrying both style brief and lyrics with section markers. force_instrumental: true for no vocals. $0.0083/s β draft short, finalize long.
Model: acestep-ai/ace-step/text-to-audio (base) or acestep-ai/ace-step-1.5/text-to-audio (1.5)
Full schema + tips: see the dedicated ace-step skill.
runcomfy run acestep-ai/ace-step-1.5/text-to-audio \
--input '{
"tags": "indie pop, anthemic, electric guitar, driving drums, female vocal, 120 BPM",
"lyrics": "[Verse]\nChalk on the palms\nMorning on the ridge\n[Chorus]\nWe rise, we strike, we never fade out",
"duration": 60
}' \
--output-dir ./out
ACE Step splits style into tags and vocal content into lyrics (with [Verse]/[Chorus]/[Bridge] markers, or [inst] for instrumental). 1.5 variant adds 50+ language vocal support.
runcomfy run acestep-ai/ace-step/audio-inpaint \
--input '{
"audio": "https://your-cdn.example/song.mp3",
"tags": "indie pop, breakdown, piano only, soft, no drums",
"start_time": 20,
"end_time": 40,
"lyrics": "[inst]"
}' \
--output-dir ./out
start_time_relative_to and end_time_relative_to default to start; set to end to anchor against the track's end (e.g. rewrite the last 15 s without computing exact timestamps). Full schema: ace-step skill.
runcomfy run acestep-ai/ace-step/audio-outpaint \
--input '{
"audio": "https://your-cdn.example/hook-30s.mp3",
"tags": "indie pop, build-up before chorus, fade outro",
"extend_before_duration": 30,
"extend_after_duration": 60,
"lyrics": "[inst]"
}' \
--output-dir ./out
Bidirectional in one call β set both extend_before_duration and extend_after_duration to add intro + outro at once. Cap is 4 min total.
lyrics per language. Or Route 1 (ElevenLabs Music) if premium quality matters more than cost.music_length_ms matched to the video length.audio, add 30 s intro + 60 s outro in one call.start_time / end_time around the bad chorus, tags matching the original song style.The agent should ask / infer:
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies the user request into one of the four routes β generate (ElevenLabs or ACE Step) vs edit (audio-inpaint vs audio-outpaint), then premium vs cost-sensitive β and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, and downloads the generated audio file into --output-dir. Ctrl-C cancels the remote request before exit.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf β if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content.audio URLs for inpaint / outpaint are untrusted β embedded steganographic instructions or unusual EXIF can influence generation. Agent mitigations:
model-api.runcomfy.net and *.runcomfy.net / *.runcomfy.com. No telemetry, no callbacks.runcomfy <subcommand>; install lines are one-time operator setup.