Install
openclaw skills install genor-comfy-gateComprehensive multi-modal gateway for ComfyUI enabling audio generation with ACE-Step 1.5 and photorealistic image creation via SDXL workflows.
openclaw skills install genor-comfy-gateTHE authoritative reference for ALL ComfyUI operations through our gateway. Multi-modal: audio, images, video (future). Read this before any generation. Updated as we learn.
| Type | Status | Workflow | Model |
|---|---|---|---|
| 🎵 Audio | ✅ Active | acestep-rapcore | ACE-Step 1.5 SFT merge |
| 🎬 Video | 🔜 Planned | — | — |
The gateway is modality-agnostic — it submits any workflow JSON to ComfyUI, polls, waits, downloads, and saves. Adding a new modality means adding a workflow file + WORKFLOW_INFO entry. The type field determines output dir (audio/ or images/).
| Property | Value |
|---|---|
| Endpoint | http://127.0.0.1:8188 |
| Auth | x-api-key: gcg-4d... header (localhost exempt) |
| Managed by | pm2 (genor-comfy-gate) |
| Location | ./ (installed dir) |
| Config | env / COMFY_SERVERS var |
Configure your ComfyUI backends via the COMFY_SERVERS environment variable:
[
{"url": "http://127.0.0.1:8188", "id": "local", "priority": true, "weight": 1}
]
Default: single local server at http://127.0.0.1:8188
| ID | URL | Priority |
|---|---|---|
| local | http://127.0.0.1:8188 | ★ (default) |
pickServer())server.url)acestep-aio — ACE-Step 1.5 Audio GenerationModel: aceStep15Music_sft17BAIO.safetensors (ACE-Step 1.5 SFT merge)
Workflow Pipeline:
CheckpointLoader(160) → AnySwitch(model/clip/vae) → TextEncode(94) → KSampler(35 steps, dpmpp_3m_sde, beta, cfg=1) → VAEDecodeTiled → SaveAudioMP3(104)
Lyrics: String(252) → TextEncode.lyrics
Duration: mxSlider(274) → TextEncode + EmptyLatent
Negative: ConditioningZeroOut(47) → zeroes the positive conditioning
| Node | Class | Role | Injections |
|---|---|---|---|
| 94 | TextEncodeAceStepAudio1.5 | Main text encoder | prompt → tags, lyrics ← 252, bpm, keyscale, duration ← 274, language |
| 252 | String | Lyrics feed into node 94 | lyrics → String |
| 3 | KSampler | Denoising (35 steps, dpmpp_3m_sde, beta, cfg=1) | seed ← 307 |
| 98 | EmptyAceStep1.5LatentAudio | Creates latent audio space | seconds ← 274 |
| 104 | SaveAudioMP3 | Output V0 MP3 | — |
| 128 | VAEDecodeAudioTiled | VAE decode (tile=512, overlap=64) | — |
| 160 | CheckpointLoaderSimple | Loads model | — |
| 274 | mxSlider | Song duration (seconds) | duration → Xi and Xf |
| 307 | Seed (rgthree) | Global seed | seed → seed |
| 257 | Text Concatenate | Builds output filename | artist+title+path |
| 47 | ConditioningZeroOut | Negative prompt (zeroed) | — |
| 78 | ModelSamplingAuraFlow | Shift=13 | Bypassed by default — use model_sampling: true to enable |
| Node | Content |
|---|---|
| 317 | Genre description table (38 genres with tags) |
| 318 | Keyscale/BPM reference table (38 genres × scale + key + BPM) |
| 320 | Structure example (metalcore duet with timeline) |
| 321 | Preset example (detailed scene-by-scene prompt) |
| 319 | LLM input example (NSFW lyrics prompt format) |
| 400 | Disconnected tags node (original rapcore tags, kept for reference) |
{
"workflow": "acestep-rapcore",
"prompt": "comma-separated tags (under 512 chars)",
"lyrics": "structured lyrics with [section] tags",
"duration": 180,
"bpm": 150,
"keyscale": "E minor",
"language": "en",
"seed": -1
}
All parameters EXCEPT prompt and lyrics are optional. Omitted parameters keep their workflow defaults.
model_sampling (optional, boolean): Enables ModelSamplingAuraFlow (shift=13) for acestep-aio. Bypassed by default — it's 50/50 whether it improves quality, so safer to leave off. Set model_sampling: true if you want to experiment with it on.
Every caption should cover as many as possible, in 5-8 comma-separated tags:
pop, piano+strings+guitar, female warm vocal, melancholic intimate, bedroom pop
rock, metal, heavy distorted guitar, powerful drums, melodic vocals, aggressive, epic, dramatic, guitar solo
heavy distorted guitar, fast thrash drums, pounding bass, aggressive, dark
rapcore metal fusion, nu-metal, punchy bass, warm distorted guitar, crisp drums, melodic chorus, heavy grooves, atmospheric, polished production, angsty female vocal, emotional
raw, gritty, distorted (without balancing warmth) → metallic scraping, flat bassheavy bass → boomy/muddy; prefer punchy bass, deep sub-bass, defined bassaggressive on instruments → harsh overtones; use on emotion/vocal instead| Word | Effect |
|---|---|
warm | Analog-style saturation, smooth high end |
crisp | Clean transients, defined attacks |
punchy | Tight, compressed low-mids, good for bass/kick |
bright | Boosted highs, airy presence |
lush | Wide stereo, rich harmonics, reverb-heavy |
dry | Close-mic sound, minimal reverb |
airy | Spacious high end, breathy |
polished | Studio-quality, balanced EQ |
raw | USE WITH CAUTION — unprocessed, potentially harsh |
gritty | USE WITH CAUTION — distortion artifacts |
ACE-Step REQUIRES section markers to align music with lyrics:
[Intro], [Verse], [Pre-Chorus], [Chorus], [Bridge], [Build], [Drop],
[Breakdown], [Guitar Solo], [Piano Interlude], [Outro]
[whispered], [raspy vocal], [powerful belting], [spoken word],
[falsetto], [harmonies], [clean vocal]
[high energy], [low energy], [building energy], [euphoric],
[melancholic], [dreamy], [aggressive]
(bass rumbles in), (drums fade to silence)Zanim wyślesz jakikolwiek tekst do ACE-Step — musisz odpowiedzieć sobie na każde z tych pytań i nie wysłać dopóki wszystkie nie są "TAK":
Dopiero gdy na każde pytanie odpowiedź brzmi TAK — możesz wysłać do generacji.
Intro → [low energy] — sparse, building
Verse 1 → [low energy] — verse, storytelling, restrained
Pre-Chorus → [building energy] — tension rising
Chorus → [high energy] — maximum impact, full instrumentation
Verse 2 → [low energy] — second verse, slightly more energy
Pre-Chorus → [building energy]
Chorus → [high energy] — second chorus often bigger (harmonies)
Bridge → [low energy] — stripped back, different perspective
Breakdown → [high energy] — instrumental intensity (optional)
Final Chorus→ [high energy] — biggest version
Outro → [low energy] — fade out
Electronic
four-on-the-floor, bright synths, uplifting, dance-driven, glossy production, rhythmic, energeticmechanical, hypnotic rhythms, minimalistic, pulsing bass, industrial textures, dark, repetitiveeuphoric, soaring leads, emotional pads, rolling basslines, uplifting, spacious, melodic, anthemicrapid breakbeats, deep sub-bass, high-energy, sharp percussion, rolling rhythms, crisp, drivingheavy bass drops, wobbling synths, aggressive textures, syncopated rhythms, dark, cinematic, grittyshimmering chords, side-chained synths, emotional, bright leads, bouncy rhythms, glossy, melodicbooming 808s, sharp hi-hats, atmospheric pads, swaggering, dark, punchy, spaciousRock/Metal
crunchy guitars, steady drums, warm analog tone, energetic, melodic, vintage, riff-drivenheavy riffs, powerful drums, gritty vocals, aggressive, energetic, distorted, bold, drivingdistorted guitars, fast drums, dark atmosphere, aggressive, heavy, intense, powerful, tightcomplex structures, technical riffs, atmospheric layers, dramatic, epic, polished, dynamicUrban
dusty drums, soulful samples, rhythmic, warm textures, punchy kicks, nostalgic, organicmellow beats, vinyl crackle, soft keys, relaxed, dreamy, warm, minimal, hazysliding 808s, haunting melodies, gritty textures, cold atmosphere, syncopated, tense, urbanPop
catchy hooks, bright synths, polished production, upbeat, melodic, modern, radio-ready, cleanretro synths, bright pads, melodic, nostalgic, electronic, polished, dreamy, airyglossy production, bright synths, genre-blending, catchy hooks, polished, theatrical, vibrantSoft/Ambient
soft pads, atmospheric textures, spacious, minimal, calm, evolving, dreamy, subtle, meditativesweeping strings, dramatic percussion, epic, emotional, grand, polished, powerful| Genre | Scale | Key Range | BPM Range |
|---|---|---|---|
| EDM/House | Minor, Dorian | D#m–Am | 120–128 |
| Techno | Phrygian, Minor | Fm–A#m | 125–135 |
| Trance | Major, Mixolydian | A–D | 130–142 |
| Drum & Bass | Minor, Dorian | Em–Gm | 170–178 |
| Dubstep | Minor, Phrygian | Fm–G#m | 138–150 |
| Future Bass | Major, Minor | C–F | 140–160 |
| Trap | Harmonic Minor | Fm–Am | 130–150 |
| Hip-Hop | Minor, Dorian | Dm–Gm | 85–95 |
| Lo-Fi | Dorian, Lydian | Cm–Fm | 60–85 |
| Pop | Major, Mixolydian | C–G | 90–130 |
| Classic Rock | Minor Pentatonic | Em–Am | 100–140 |
| Hard Rock | Minor, Phrygian | Em–Gm | 120–160 |
| Metal | Phrygian, Harmonic Minor | Dm–F#m | 140–200 |
| Prog Metal | Dorian, Melodic Minor | C#m–F#m | 120–180 |
| Blues | Blues Scale, Minor Pentatonic | Em–Am | 70–120 |
| Funk | Mixolydian, Dorian | E–A | 100–120 |
| Disco | Mixolydian, Major | F–Bb | 110–130 |
| R&B | Dorian, Minor | Dm–Gm | 60–100 |
| Ambient | Lydian, Dorian | C–F | 60–90 |
| Cinematic | Minor, Harmonic Minor | Cm–Fm | 60–120 |
| Reggae | Major, Mixolydian | A–D | 70–90 |
| K-Pop | Major, Minor | C–F# | 100–140 |
| Anime OST | Lydian, Major | C–E | 80–160 |
The workflow includes an example of how to structure a caption WITH a song structure plan:
metalcore, symphonic elements, theatrical, duet, heavy distorted guitar,
bright piano, studio-polished, dramatic, melodic, epic, intense.
Structure:
- Intro: brief intro dramatically builds to first verse
- Verse 1: atmospheric piano, sets scene, raspy male vocal only
- Verse 2: guitar power chords, groovy, young female vocal only
- Chorus: anthemic, layered, male+female duet harmonies
- Bridge: atmospheric, dreamy, calm, female vocal only
- Build-up: builds to epic instrumental solo
- Instrumental: fast guitar solo, lead licks, virtuoso shred
- End: powerful ending
This can go in the caption to give the model a temporal roadmap.
For maximum control, describe each section's instrumentation and mood in prose:
Intro: A metalcore-tinged, symphonic swell opens the track, with bright piano glimmering
over theatrical strings. Tension rises—studio-polished, dramatic—until it snaps into verse.
Verse 1: Drops to atmospheric piano, soft but charged. Raspy male vocal, intimate, whispered.
No guitars—just piano, subtle pads, suspended breath.
Verse 2: Guitar power chords crash in, groovy pulse. Young female vocal, bright and soaring.
Symphonic elements widen the space, cinematic lift.
Chorus: Erupts into anthemic, epic chorus. Male+female duet harmonies. Distorted guitars,
sweeping strings, pounding drums—polished, intense.
Bridge: Everything falls away. Dreamy, atmospheric, weightless. Soft pads, distant piano,
female vocal airy and ethereal. Suspended.
Build-up: Rhythmic pulses return. Low strings, tom rolls, rising synths. Guitars re-enter
in bursts. Energy coils toward instrumental break.
Instrumental: Fast guitar solo, virtuoso shred, rapid licks, melodic flourishes.
Symphonic backing, metalcore precision drums. Flashy, intense, climactic.
| Method | Path | Description |
|---|---|---|
| GET | / | Health check + server statuses |
| GET | /workflows | List available workflows with types |
| POST | /generate-and-wait | PRIMARY — submit, wait, download, save. Use this for all generation. |
| POST | /prompt | Submit workflow, return prompt_id |
| GET | /history/:prompt_id | Get single prompt result |
| GET | /history | Aggregated history from all servers |
| GET | /queue | Aggregated queue (running + pending) |
| GET | /view | Proxy media file download |
| GET | /system_stats | First alive server system info |
| GET | /object_info | Proxy to ComfyUI object_info |
| GET | /extensions | Proxy to ComfyUI extensions |
| Method | Path | Description |
|---|---|---|
| GET | /generate | Get generation options form |
| POST | /generate | Submit image generation |
| POST | /upload/image | Upload image to ComfyUI input dir |
| Method | Path | Description |
|---|---|---|
| GET | /media-list | List generated files (name, size, date, preview URLs) |
| POST | /media-link-once | Create one-time access token for a file |
| GET | /media-once/:token | Access file via one-time token (no API key needed) |
| Method | Path | Description |
|---|---|---|
| POST | /workflow/:name/prompt | Quick prompt submit for named workflow (auto-injects) |
POST /generate-and-wait — Full Referencecurl -s -X POST http://127.0.0.1:8188/generate-and-wait \
-H "Content-Type: application/json" \
-d '{
"workflow": "acestep-rapcore",
"prompt": "...",
"lyrics": "...",
"duration": 200,
"bpm": 150,
"keyscale": "E minor",
"language": "en",
"seed": -1
}'
Audio params: prompt (required), lyrics, duration, bpm, keyscale, language, seed
Image params: prompt (required), aspect_ratio, seed, steps, cfg
Common: workflow (default: acestep-rapcore), client_id
Success response:
{
"status": "ok",
"file": "/var/data/comfy-media/audio/example-output.mp3",
"filename": "example-output.mp3",
"type": "audio",
"server": "sec",
"workflow": "acestep-rapcore",
"file_size": 5882890
}
Output saved with metadata sidecar (.json) in ~/media/comfy/<audio|images>/.
pm2 restart genor-comfy-gate
pm2 logs genor-comfy-gate --lines 20
curl -s http://127.0.0.1:8188/ | python3 -m json.tool
curl -s http://127.0.0.1:8188/queue | python3 -m json.tool
~/media/comfy/audio/ — generated MP3 files + .json sidecars
~/media/comfy/images/ — generated PNG files + .json sidecars
/history/:prompt_id every 2s until complete/fail/timeoutWhen we discover new caption patterns, texture word effects, or workflow tricks:
CHANGELOG.md (next to this skill)CheckpointLoader(43) → LoRA stack(47,80) → Resolution(17) → KSampler(7, 12 steps LCM) →
UltimateSDUpscale(88, 2x, 4x-UltraSharp) →
FaceDetailer NIP(97) → FaceDetailer V(98) → FaceDetailer P(101) →
FaceDetailer face(104, 1024px, 6 steps) → FaceDetailer hands(105, 2048px, 6 steps) →
SeedVR2VideoUpscaler(114, 2048px final) → CRT Post-Process(115) → SaveImage(200)
| LoRA | Strength | Purpose |
|---|---|---|
| AddMicroDetails v6 | 0.2 | Skin texture, fine details |
| PersonEnhanceV2 ILL | 0.1 | Better anatomy/face |
| TrendCraft Style Detailer v2.4I | 0.1 | Overall polish/detail |
| LoRA | Strength | Purpose |
|---|---|---|
| DTLVVTT DMD2 V5-LITE | 1.0 | DMD2 distillation (faster/better LCM) |
Sequential detailers with YOLO detectors:
nipples_yolov8s-seg.pt) — nipple detection, 1024px, denoise 0.4nsfw-seg-vagina-x.pt) — vagina detection, 1024px, denoise 0.4nsfw-seg-penis-x.pt) — penis detection, 1024px, denoise 0.4Anzhc Face seg 768MS v2 y8n.pt) — face detection, 1024px, 6 steps, denoise 0.4PitHandDetailer-v2-Test-v9c.pt) — hand detection, 2048px, 6 steps, denoise 0.5seedvr2_ema_7b_sharp-Q4_K_M.gguf (quantized 7B)ema_vae_fp16.safetensorsCRITICAL: LUSTIFY is Illustrious-based — use Danbooru-format tags, NOT natural language descriptions.
masterpiece, best quality, amazing quality, very aesthetic, absurdres
1girl, solo, cute, petite, pale skin, medium breasts
gym uniform, white shirt, sports shorts, sneakers, ponytail
jumping, dynamic pose, looking at viewer
gym background, afternoon light, dutch angle, from below
blurry, worst quality, bad quality, error, melted body, bad anatomy, bad hands, disfigured
masterpiece, best quality are weighteddutch angle, from below, from above, close-upsunlight, god rays, afternoon light, backlight1girl, solo{
"workflow": "acestep-aio",
"prompt": "masterpiece, best quality, 1girl, cute, ...",
"aspect_ratio": "7:9 (Portrait)",
"seed": -1
}
Valid aspect ratios:
1:1 (Square)4:5 (Portrait)7:9 (Portrait) ← default, best for single character3:2 (Landscape)16:9 (Landscape)9:16 (Portrait)Additional optional params: megapixels (default 1.5), steps, cfg, denoise, sampler_name, scheduler
workflows/<name>.json'<name>': { file: '<name>.json', type: 'audio'|'image'|'video', ext: 'mp3'|'png'|'mp4',
promptNode: '94', promptField: 'tags', lyricsNode: '252', lyricsField: 'String',
outputNode: '104' }
pm2 restart genor-comfy-gateThe gateway auto-handles: prompt injection, duration, BPM/keyscale (audio), aspect_ratio (image), seed, polling, download from correct server, save to media dir, metadata sidecar.
masterpiece, best quality) must come FIRST — they're weightedgetOutputInfo() function returned undefined filenames despite reading them from history correctly. Fixed by inlining output scanning in the handler.raw, gritty, heavy drops cause metallic scraping and flat bass. Use warm, crisp, punchy, polished for clean instruments.