Install
openclaw skills install ace-stepGenerate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model โ tag-driven composition (genre, mood, instruments), multilingual lyrics with section markers, 5 s to 4 min stereo output, $0.0002โ0.0003 per second (โ 27ร cheaper than ElevenLabs Music). Four endpoints: ACE Step text-to-audio (the default), ACE Step 1.5 text-to-audio (50+ language lyrics, refined structured-lyric handling), ACE Step audio-inpaint (regenerate a time range inside an existing track), ACE Step audio-outpaint (extend an existing track before or after). Triggers on "ace step", "ace-step", "acestep", "ACE music", "open music model", "cheap AI music", "inpaint audio", "audio inpaint", "extend music", "audio outpaint", "lengthen track", "music with tags", or any explicit ask to generate or edit music with ACE Step.
openclaw skills install ace-stepTag-driven music generation, inpainting, and outpainting with StepFun-AI's ACE Step open-weights model. Four CLI-reachable endpoints, $0.0002โ0.0003 per second of audio, up to 4 minutes per call.
runcomfy.com ยท ACE Step base ยท ACE Step 1.5 ยท CLI docs
# 1. Install (one of โ see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # global install
npx -y @runcomfy/cli --version # zero-install
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Generate
runcomfy run acestep-ai/ace-step/text-to-audio \
--input '{"tags": "..."}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
Listed newest first.
ACE Step 1.5 (text-to-audio) โ acestep-ai/ace-step-1.5/text-to-audio
Latest ACE Step generation. 50+ language vocal support, refined structured-lyric handling, otherwise same shape as base. Slightly higher cost ($0.0003/s vs $0.0002/s). Pick for: multilingual lyrics, hero-quality vocal tracks, vocal songs that need clean section structure. Avoid for: cost-sensitive batches where the base model is good enough.
ACE Step (text-to-audio) โ acestep-ai/ace-step/text-to-audio (default โ cheap & fast)
Original ACE Step. Tag-driven composition, optional lyrics, 5โ240 s stereo. $0.0002/s โ ~27ร cheaper than ElevenLabs Music. Pick for: high-volume drafts, background music, jingles, game loops, cost-sensitive iteration. Avoid for: maximally polished commercial vocal hooks โ try ACE Step 1.5 or ElevenLabs Music for those.
ACE Step (audio-inpaint) โ acestep-ai/ace-step/audio-inpaint
Regenerate a time range inside an existing track (not mask-based; uses
start_time/end_timein seconds, each anchored to track start or end). Pick for: fix a bad chorus in the middle, swap the bridge, replace a 20 s section without re-rendering the whole song. Avoid for: edits that aren't time-bounded โ those don't fit the schema.
ACE Step (audio-outpaint) โ acestep-ai/ace-step/audio-outpaint
Extend an existing track bidirectionally โ add intro before, outro after, or both. Pick for: lengthening a 30 s draft into a 2 min cut, adding a fade-in, building a longer arrangement around an existing hook. Avoid for: extending a track past 4 min total โ chain calls instead.
Model: acestep-ai/ace-step/text-to-audio (or acestep-ai/ace-step-1.5/text-to-audio for the 1.5 variant)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
tags | string | yes | โ | Comma-separated genre / mood / instrument tags. Drives composition |
lyrics | string | no | โ | Vocal content. Use section markers [Verse], [Chorus], [Bridge]. Use [inst] or [instrumental] for no vocals |
duration | int | no | 60 | Audio length in seconds. 5โ240 (max 4 min per call) |
seed | int | no | -1 | Reproducibility; -1 randomizes |
Pricing: ACE Step $0.0002/s ยท ACE Step 1.5 $0.0003/s. 60 s โ $0.012 / $0.018; 240 s โ $0.048 / $0.072.
Tag-driven instrumental:
runcomfy run acestep-ai/ace-step/text-to-audio \
--input '{
"tags": "lo-fi hip-hop, mellow, vinyl crackle, rhodes piano, soft drums, 75 BPM",
"lyrics": "[inst]",
"duration": 90
}' \
--output-dir ./out
Full vocal song with structure (use 1.5 for multilingual):
runcomfy run acestep-ai/ace-step-1.5/text-to-audio \
--input '{
"tags": "indie pop, anthemic, electric guitar, driving drums, female vocal, 120 BPM",
"lyrics": "[Verse]\nChalk on the palms, laces double-knotted\nMorning on the ridge, the sun is rising\n[Chorus]\nWe rise, we strike, we never fade out\nWe rise, we strike, we sing it loud\n[Bridge]\nSoft piano breakdown\n[Outro]\nFull band, fade",
"duration": 60
}' \
--output-dir ./out
"lo-fi hip-hop, mellow, vinyl crackle, rhodes piano, soft drums, 75 BPM" beats "chill music".[Verse], [Chorus], [Bridge], [Outro]. Keep meter consistent across lines."lyrics": "[inst]" or "[instrumental]". Belt-and-suspenders: also say "no vocals" in tags."japanese vocal, j-pop")."seed": 42); use -1 to explore variations.Model: acestep-ai/ace-step/audio-inpaint
Catalog: audio-inpaint
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
audio | string | yes | โ | HTTPS URL to MP3 / WAV / FLAC. Up to 60 min |
tags | string | yes | โ | Comma-separated tags steering the regenerated segment |
start_time | float | no | โ | Start of editable segment, in seconds (0โ240) |
start_time_relative_to | enum | no | start | start or end โ anchor for start_time |
end_time | float | no | 30 | End of editable segment, in seconds (0โ240) |
end_time_relative_to | enum | no | start | start or end โ anchor for end_time |
lyrics | string | no | โ | Lyrics for the regenerated segment. Blank = model writes; [inst] = no vocals |
seed | int | no | -1 | Reproducibility |
No mask โ region is defined purely by start_time / end_time (each anchorable to track start or end).
Replace 20โ40 s of a track with a new bridge:
runcomfy run acestep-ai/ace-step/audio-inpaint \
--input '{
"audio": "https://your-cdn.example/original-track.mp3",
"tags": "indie pop, breakdown, piano only, soft, no drums",
"start_time": 20,
"end_time": 40,
"lyrics": "[inst]"
}' \
--output-dir ./out
Anchor end relative to track end (rewrite the last 15 s):
runcomfy run acestep-ai/ace-step/audio-inpaint \
--input '{
"audio": "https://your-cdn.example/song.mp3",
"tags": "indie pop, fade, soft, ambient pad",
"start_time": 15,
"start_time_relative_to": "end",
"end_time": 0,
"end_time_relative_to": "end"
}' \
--output-dir ./out
_relative_to: "end" to target the outro/last seconds without computing exact timestamps.Model: acestep-ai/ace-step/audio-outpaint
Catalog: audio-outpaint
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
audio | string | yes | โ | HTTPS URL to MP3 / WAV / FLAC. Up to 60 min |
tags | string | yes | โ | Tags steering the extended sections |
extend_before_duration | float | no | 0 | Seconds of new audio before the original (0โ240) |
extend_after_duration | float | no | 30 | Seconds of new audio after the original (0โ240) |
lyrics | string | no | โ | Optional lyrics for extended sections |
seed | int | no | -1 | Reproducibility |
Extend a 30 s hook into a 2 min cut (add 30 s intro + 60 s outro):
runcomfy run acestep-ai/ace-step/audio-outpaint \
--input '{
"audio": "https://your-cdn.example/hook-30s.mp3",
"tags": "indie pop, electric guitar, drums, build-up before chorus, fade outro",
"extend_before_duration": 30,
"extend_after_duration": 60,
"lyrics": "[inst]"
}' \
--output-dir ./out
Add only a fade-out (no pre-extension):
runcomfy run acestep-ai/ace-step/audio-outpaint \
--input '{
"audio": "https://your-cdn.example/track.mp3",
"tags": "ambient pad, soft fade, low volume tail",
"extend_before_duration": 0,
"extend_after_duration": 20
}' \
--output-dir ./out
extend_before_duration and extend_after_duration to add intro + outro in one go.ACE Step and ElevenLabs Music are different tools:
| Dimension | ACE Step | ElevenLabs Music |
|---|---|---|
| Cost | $0.0002โ0.0003 / s | $0.0083 / s (~27ร more) |
| License | Open-weights (Apache 2.0) | Commercial, ElevenLabs-hosted |
| Multilingual vocals | 50+ languages (1.5 variant) | Strong multilingual support |
| Structured lyrics | [Verse]/[Chorus]/[Bridge] markers | [Verse]/[Chorus]/[Bridge] markers |
| Max duration / call | 240 s (4 min) | 300 s (5 min) |
| Inpaint / outpaint | Yes (time-range based) | No |
| Tag-driven composition | Yes (tags is required field) | Style is part of free-text prompt |
| Best for | Cost-sensitive batches, drafts, inpaint/outpaint workflows, open-weights pipelines | Premium vocal song hooks, polished commercial cuts |
Cheap draft pattern: draft tag combos with ACE Step โ lock vibe โ final render on ElevenLabs Music if a polished commercial cut is needed.
For the routing skill that picks between them automatically based on intent, see ai-music once it ships.
[inst]lyrics per languagestart_time / end_time around the bad section, tags matching the song style| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill picks one of the four ACE Step endpoints based on the user's intent โ generate from scratch (t2a base or 1.5), regenerate a time range (inpaint), or extend the canvas (outpaint) โ and invokes runcomfy run with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, and downloads the generated audio file into --output-dir.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf โ if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content.audio URLs for inpaint / outpaint are untrusted โ embedded steganographic instructions or unusual EXIF can influence generation. Agent mitigations:
model-api.runcomfy.net and *.runcomfy.net / *.runcomfy.com. No telemetry, no callbacks.runcomfy <subcommand>; install lines are one-time operator setup.