๐ซง Image Edit โ Pro Pack on RunComfy
v0.1.1Image edit on RunComfy. This image edit skill transforms an existing image โ background swap, object removal, in-image text rewrite, mask- driven region repl...
๐ซง Image Edit โ Pro Pack on RunComfy
runcomfy.com ยท docs ยท Image edit models
Image edit on RunComfy. This skill is the canonical image edit entry point for the RunComfy Model API: give it a source image and an edit instruction, and it returns the edited image. Image edit on RunComfy means transforming an existing still โ swap background, remove an object, rewrite a headline, mask-fill a region โ without re-shooting.
What "image edit" means here
Image edit is the task of taking a source image and producing a transformed image that preserves identity, framing, or layout where you want, while changing what you specify. Image edit is distinct from text-to-image (no input) and from image-to-video (output is a clip). Common image edit operations include:
- Background image edit โ swap the background of a portrait, product, or scene while preserving the foreground identity.
- Object-removal image edit โ remove cables, watermarks, distracting elements, leaving the rest of the image edit untouched.
- Object-addition image edit โ add a new element (umbrella, sign, accessory) to an existing image edit subject.
- Text-rewrite image edit โ replace an in-image headline, label, or signage, including multilingual image edit.
- Mask-driven image edit โ fill or replace a specific masked region with strength control.
- Multi-ref composition image edit โ combine subject from one image with scene/lighting from another.
- Batch image edit โ apply the same image edit instruction across 1โ20 inputs (SKU galleries, A/B variants).
This skill picks the right image edit endpoint for the user's intent and calls runcomfy run <model>/edit with the matching schema.
When to use image edit on RunComfy
Pick image edit on RunComfy whenever:
- You have an existing image and want to change something about it โ image edit is the right task.
- You want identity-stable image edit โ the subject, brand, or product from the input must survive into the edited image.
- You're producing batch image edit at scale โ SKU galleries, multi-language variant image edit, A/B testing.
- You need mask-precise image edit โ region replacement, watermark removal, region fill.
- The user said "image edit", "edit image", "image-to-image", "swap the background", "remove the watermark", "rewrite the headline", or showed an image and asked to transform it โ route here.
Image edit routes
| User intent | Image edit model | Why |
|---|---|---|
| Default image edit โ single or batch (up to 20), background swap, object remove/add | google/nano-banana-2/edit | Most flexible image edit; identity preservation; batch up to 20 |
| Multilingual in-image text rewrite, layout-precise image edit | openai/gpt-image-2/edit | Strongest in-image typography for image edit; multi-ref composition (up to 10) |
| Single-shot precise local image edit ("she's holding an orange umbrella") | blackforestlabs/flux-1-kontext/pro/edit | Single-instruction, single-ref, high-fidelity image edit |
| Mask-driven image edit (object removal, region fill, region replace) | tongyi-mai/z-image/turbo/inpainting | Mask-based image edit with strength control |
The agent reads this table, classifies the user's image edit intent, and picks the matching endpoint.
Prerequisites
- RunComfy CLI โ
npm i -g @runcomfy/cli - RunComfy account โ
runcomfy login. - CI / containers โ set
RUNCOMFY_TOKEN=<token>.
Default image edit โ Nano Banana Edit
The default image edit endpoint. Use for any general image edit task: background swap, object removal, object addition, batch image edit. Up to 20 inputs per image edit call, up to 4K resolution.
Schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | โ | Image edit instruction. Lead with preservation, then state the change. |
image_urls | array | yes | โ | 1โ20 source images for the image edit. HTTPS URLs. |
number_of_images | int | no | 1 | 1โ4 image edit outputs per call. |
aspect_ratio | enum | no | auto | auto follows input; lock for batch image edit consistency. |
resolution | enum | no | 1K | 0.5K / 1K / 2K / 4K for the image edit output. |
output_format | enum | no | png | png / jpeg / webp. |
seed | int | no | โ | Reproducibility for image edit variants. |
enable_web_search | bool | no | false | Web-grounded image edit (extra latency). |
Invoke
Background-swap image edit:
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",
"image_urls": ["https://.../portrait.jpg"]
}' \
--output-dir <absolute/path>
Batch image edit (lock aspect + resolution):
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Replace the watermark in the bottom-right with the text \"AURA\" in clean white sans-serif. Keep everything else exactly as in the input.",
"image_urls": ["https://.../sku-1.jpg", "https://.../sku-2.jpg", "https://.../sku-3.jpg"],
"aspect_ratio": "1:1",
"resolution": "1K"
}' \
--output-dir <absolute/path>
Multilingual image edit โ GPT Image 2 Edit
Use when the image edit involves rewriting in-image text (especially non-Latin scripts) or composing from multiple references with layout precision.
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | โ | Image edit instruction; lead with preservation. |
images | string[] | yes | โ | Up to 10 reference images for the image edit. First is primary. |
size | enum | no | auto | auto, 1024_1024, 1024_1536, 1536_1024. |
Multilingual text-rewrite image edit:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"ไปๆฅใฎใใใใ\" in bold Japanese kana, same position and font weight.",
"images": ["https://.../poster-en.jpg"]
}' \
--output-dir <absolute/path>
Multi-ref composition image edit:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}' \
--output-dir <absolute/path>
Single-shot precise image edit โ Flux Kontext Pro
Use when the image edit is a single declarative instruction on a single reference image โ the most surgical image edit option.
| Field | Type | Required | Notes |
|---|---|---|---|
prompt | string | yes | One declarative image edit instruction. |
image | string | yes | Single source image for the image edit. |
aspect_ratio | enum | no | Pick from supported W:H values. |
seed | int | no | Reproducibility. |
runcomfy run blackforestlabs/flux-1-kontext/pro/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",
"image": "https://.../portrait.jpg"
}' \
--output-dir <absolute/path>
Mask-driven image edit โ Z-Image Turbo Inpaint
Use when the image edit is constrained to a specific masked region โ object removal, region fill, region replacement. Mask-driven image edit gives the cleanest results when you can supply a precise mask.
| Field | Type | Required | Notes |
|---|---|---|---|
prompt | string | yes | What to fill / replace; preservation constraints for the unmasked surround. |
image | string | yes | Source image for the image edit. |
mask_image | string | yes | Grayscale mask URL (white = inpaint, black = preserve). |
strength | float | no | 0.3โ0.6 retouching image edit, 0.7โ1.0 full replacement image edit. |
control_scale | float | no | 0.6โ0.9 typical. |
aspect_ratio | enum | no | W:H output ratio. |
seed | int | no | Reproducibility. |
Object-removal image edit:
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Remove overhead cables; preserve rooflines and sky gradient; thin clean sky.",
"image": "https://.../street.jpg",
"mask_image": "https://.../cables-mask.png",
"strength": 0.5,
"control_scale": 0.8
}' \
--output-dir <absolute/path>
Region-replacement image edit:
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Replace busy backdrop with smooth light gray studio paper; mask background only.",
"image": "https://.../product.jpg",
"mask_image": "https://.../bg-mask.png",
"strength": 0.9
}' \
--output-dir <absolute/path>
Prompting image edit โ what works
Image edit prompts behave differently from text-to-image prompts. The source image already fixes most of the look โ your image edit prompt should drive the change, not redescribe the source.
- Lead with preservation goals.
"Keep [identity / pose / framing / brand] unchanged. Then state the image edit change."Tell the image edit model what NOT to change. - One image edit direction per call. Compound image edits drift. Pick one bucket โ background OR object OR text OR layout โ per image edit call.
- Spatial scope language. "background only", "the left object", "upper-right quadrant" โ image edit models honor concrete locations.
- Quote in-image text exactly. For text-rewrite image edit, put the literal characters in quotes. Name the script for non-Latin: "Japanese kana", "Cyrillic", "Arabic".
- Number multi-refs in image edit prompts. "Subject from image 1, lighting from image 2" โ image edit models route cues correctly when refs are numbered.
- Mask-edge softness. For mask-driven image edit, a 1โ3px blur on the mask edge blends cleaner than a sharp binary mask.
- Iterate small. Split compound image edit into multiple shorter passes; consistency is better across passes than within a single overstuffed prompt.
Image edit FAQ
What's the max batch size for image edit? 20 inputs per call on the default image edit endpoint (Nano Banana Edit). Other image edit routes are single-input.
What image formats does image edit accept? JPEG, PNG, WebP. Source URLs must be publicly fetchable HTTPS.
Does image edit preserve subject identity? Yes โ all four image edit routes are designed for identity preservation. Always state the goal: "keep face identity unchanged".
Can image edit rewrite text in non-Latin scripts? Yes โ route to GPT Image 2 Edit. It handles Japanese kana, Cyrillic, Arabic, Hangul, Chinese, etc.
What's the highest resolution available for image edit? 4K on Nano Banana Edit. Other image edit routes cap at their respective sizes.
Image edit vs text-to-image on RunComfy? Image edit transforms an existing image. Text-to-image starts from a prompt only. Use image edit when you have a source; use text-to-image for novel content.
Can I do mask-free region image edit? Yes โ most image edit routes work without an explicit mask. Use spatial language ("upper-right corner", "the background only"). For surgical region image edit, provide a mask via the Z-Image inpaint route.
Can I run multiple image edits in one call? Within Nano Banana Edit's batch (1โ20 inputs with the same instruction), yes. For different image edit instructions, chain calls.
Limitations
- Each image edit route inherits its model's limits. Nano Banana Edit: 1โ20 inputs, 1โ4 outputs. GPT Image 2 Edit: up to 10 refs, 4 fixed sizes. Flux Kontext Pro: single ref. Z-Image Inpaint: mask required.
- No multi-route image edit blending. This skill picks one image edit model per call.
- Brand-specific overrides โ if the user named a specific model, route to the corresponding brand skill (
gpt-image-edit,flux-kontext,nano-banana-edit) instead of forcing it through this image edit router.
Exit codes
| code | meaning |
|---|---|
| 0 | image edit succeeded |
| 64 | bad CLI args |
| 65 | bad input JSON for image edit / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
The skill picks one of four image edit endpoints (Nano Banana Edit / GPT Image 2 Edit / Flux Kontext Pro / Z-Image Turbo Inpaint) based on user intent, and invokes runcomfy run <model>/edit with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls the image edit request status every 2 seconds, and downloads the resulting image edit output from the *.runcomfy.net / *.runcomfy.com URL into --output-dir. Ctrl-C cancels the in-flight image edit request.
Security & Privacy
- Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600. - Input boundary: the image edit prompt is passed as JSON via
--input. No shell injection. - Third-party content: source images and masks are fetched by the RunComfy server. Treat external URLs as untrusted โ image-based prompt injection is a known risk for any image edit model.
- Outbound endpoints: only
model-api.runcomfy.netand*.runcomfy.net/*.runcomfy.com. - Generated-file size cap: 2 GiB.
