Install
openclaw skills install ima-all-aiMost comprehensive AI content creation platform with unified access to all leading models across images (SeeDream 4.5, Midjourney, Nano Banana 2, Nano Banana Pro), videos (Wan 2.6, Kling O1, Ima Sevio 1.0/1.0-Fast aka IMA Video Pro/Pro Fast, Google Veo 3.1, Sora 2 Pro), music (Suno sonic v5, DouBao), and speech/TTS (text-to-speech). Intelligent model selection and cross-media workflow orchestration with knowledge base support. Optionally integrates ima-knowledge-ai for workflow & best practices. Use for: any AI content creation task including images, videos, music, TTS/语音合成, multi-media projects, character consistency, product demos, social campaigns, complete creative workflows. Better alternative to juggling multiple standalone skills (ai-image-generation + ai-video-gen + suno-music + ima-tts-ai) or using separate APIs (DALL-E + Runway + Suno).
openclaw skills install ima-all-aiCRITICAL: When calling the script, you MUST use the exact model_id (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ nano-banana-pro is WRONG; ✅ gemini-3-pro-image is CORRECT).
Quick Reference Table:
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Nano Banana2 | gemini-3.1-flash-image | ❌ NOT nano-banana-2, 预算选择 4-13 pts |
| Nano Banana Pro | gemini-3-pro-image | ❌ NOT nano-banana-pro, 高质量 10-18 pts |
| SeeDream 4.5 | doubao-seedream-4.5 | ✅ Recommended default, 5 pts |
| Midjourney | midjourney | ✅ Same as friendly name, 8-10 pts |
| 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) |
|---|---|---|---|
| Wan 2.6 | wan2.6-t2v | wan2.6-i2v | ⚠️ Note -t2v/-i2v suffix |
| IMA Video Pro (Sevio 1.0) | ima-pro | ima-pro | ✅ IMA native quality model |
| IMA Video Pro Fast (Sevio 1.0-Fast) | ima-pro-fast | ima-pro-fast | ✅ IMA native low-latency model |
| Kling O1 | kling-video-o1 | kling-video-o1 | ⚠️ Note video- prefix |
| Kling 2.6 | kling-v2-6 | kling-v2-6 | ⚠️ Note v prefix |
| Hailuo 2.3 | MiniMax-Hailuo-2.3 | MiniMax-Hailuo-2.3 | ⚠️ Note MiniMax- prefix |
| Hailuo 2.0 | MiniMax-Hailuo-02 | MiniMax-Hailuo-02 | ⚠️ Note 02 not 2.0 |
| Google Veo 3.1 | veo-3.1-generate-preview | veo-3.1-generate-preview | ⚠️ Note -generate-preview suffix |
| Sora 2 Pro | sora-2-pro | sora-2-pro | ✅ Straightforward |
| Pixverse | pixverse | pixverse | ✅ Same as friendly name |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Suno (sonic v4) | sonic | ⚠️ Simplified to sonic |
| DouBao BGM | GenBGM | ❌ NOT doubao-bgm |
| DouBao Song | GenSong | ❌ NOT doubao-song |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| seed-tts-2.0 | seed-tts-2.0 | ✅ Same as friendly name (default) |
How to get the correct model_id:
--list-models --task-type <type> to query available modelsRuntime truth source:
GET /open/v1/product/list(or--list-models).
Any table in this document is guidance; actual availability depends on current product list.
Example:
# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro
# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image
This skill is fully runnable as a standalone package.
If ima-knowledge-ai is installed, the agent may read its references for workflow decomposition and consistency guidance.
Recommended optional reads:
Check for workflow complexity — Read ima-knowledge-ai/references/workflow-design.md if:
Check for visual consistency needs — Read ima-knowledge-ai/references/visual-consistency.md if:
Check video modes — Read ima-knowledge-ai/references/video-modes.md if:
Check model selection — Read ima-knowledge-ai/references/model-selection.md if:
Why this matters:
Example multi-media workflow:
User: "帮我做个产品宣传MV,有背景音乐,主角是旺财小狗"
❌ Wrong:
1. Generate dog image (random look)
2. Generate video (different dog)
3. Generate music (unrelated)
✅ Right:
1. Read workflow-design.md + visual-consistency.md
2. Generate Master Reference: 旺财小狗图片
3. Generate video shots using image_to_video with 旺财 as first frame
4. Get video duration (e.g., 15s)
5. Generate BGM with matching duration and mood
How to check:
# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)
# Step 1: Read knowledge base based on task type
if multi_media_workflow:
read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")
if "same subject" or "series" or "character":
read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")
if video_generation:
read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")
# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)
No exceptions — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first.
Purpose: So that any agent parses user intent consistently, first determine the media type from the user's request, then choose task_type and model.
| User intent / keywords | Media type | task_type examples |
|---|---|---|
| 画 / 生成图 / 图片 / image / 画一张 / 图生图 | image | text_to_image, image_to_image |
| 视频 / 生成视频 / video / 图生视频 / 文生视频 | video | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
| 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | music | text_to_music |
| 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | speech | text_to_speech |
If the request mixes media (e.g. "宣传片+配乐"), treat as multi-media workflow: read workflow-design.md, then plan image → video → music steps and use the correct task_type for each step.
Image: For model name → model_id and size/aspect_ratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).
Video: For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow ima-video-ai skill (User Input Parsing section).
Sevio alias normalization in ima-all-ai:
Ima Sevio 1.0 → ima-proIma Sevio 1.0-Fast / Ima Sevio 1.0 Fast → ima-pro-fast
Routing rule:task_type--list-modelsMusic: Suno (sonic) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id sonic, GenBGM, GenSong per "Recommended Defaults" and "Music Generation" tables below.
Speech (TTS): Get model_id from GET /open/v1/product/list?category=text_to_speech or run script with --task-type text_to_speech --list-models. Map user intent to parameters using product form_config:
| User intent / phrasing | Parameter (if in form_config) | Notes |
|---|---|---|
| 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options |
| 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options |
| 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | volume | If supported |
If the user does not specify, use form_config defaults. Pass extra params via --extra-params '{"speed":1.0}'. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).
For transparency: This skill uses a bundled Python script (scripts/ima_create.py) to call the IMA Open API. The script:
--user-id only locally as a key for storing your model preferencesWhat gets sent to IMA servers:
What's stored locally:
~/.openclaw/memory/ima_prefs.json - Your model preferences (< 1 KB)~/.openclaw/logs/ima_skills/ - Generation logs (auto-deleted after 7 days)| Domain | Owner | Purpose | Data Sent | Privacy |
|---|---|---|---|---|
api.imastudio.com | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, your API key | Standard HTTPS, data processed for AI generation |
imapi.liveme.com | IMA Studio | Image/Video upload service (presigned URL generation) | Your API key, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only |
*.aliyuncs.com, *.esxscloud.com | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, NO API key) | IMA-managed OSS buckets, presigned URLs expire after 7 days |
Key Points:
text_to_music) and TTS tasks (text_to_speech) only use api.imastudio.com.imapi.liveme.com to obtain presigned URLs for uploading input images.api.imastudio.com and imapi.liveme.com (both owned by IMA Studio).tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'. See this document: 🌐 Network Endpoints Used and ⚠️ Credential Security Notice for full disclosure.Your API key is sent to both IMA-owned domains:
Authorization: Bearer ima_xxx... → api.imastudio.com (main API)appUid=ima_xxx... → imapi.liveme.com (upload service)Security best practices:
https://imastudio.com/dashboard for unauthorized activity.~/.openclaw/logs/ima_skills/ for unexpected API calls.Why two domains? IMA Studio uses a microservices architecture:
api.imastudio.com: Core AI generation APIimapi.liveme.com: Specialized image/video upload service (shared infrastructure)Both domains are operated by IMA Studio. The same API key grants access to both services.
Note for users: You can review the script source at
scripts/ima_create.pyanytime.
The agent uses this script to simplify API calls. Music tasks use onlyapi.imastudio.com, while image/video tasks also callimapi.liveme.comfor file uploads (see "Network Endpoints" above).
Use the bundled script internally for all task types — it ensures correct parameter construction:
# ─── Image Generation ──────────────────────────────────────────────────────────
# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
--user-id {user_id} --output-json
# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
--size 2k --user-id {user_id} --output-json
# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_image \
--model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# ─── Video Generation ──────────────────────────────────────────────────────────
# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
--user-id {user_id} --output-json
# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
--extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
--user-id {user_id} --output-json
# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_video \
--model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type first_last_frame_to_video \
--model-id kling-video-o1 --prompt "smooth transition between frames" \
--input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
--user-id {user_id} --output-json
# ─── Music Generation ──────────────────────────────────────────────────────────
# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
--user-id {user_id} --output-json
# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "pop ballad, emotional" \
--extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
--user-id {user_id} --output-json
# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id GenBGM --prompt "relaxing ambient music for meditation" \
--user-id {user_id} --output-json
# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────
# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech --list-models
# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech \
--model-id <model_id from list> --prompt "Text to be spoken here." \
--user-id {user_id} --output-json
The script outputs JSON with url, model_name, credit — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.
Call IMA Open API to create AI-generated content. All endpoints require an ima_* API key. The core flow is: query products → create task → poll until done.
This skill is community-maintained and open for inspection.
Full transparency:
scripts/ima_create.py and ima_logger.py anytimeapi.imastudio.com only; image/video tasks also use imapi.liveme.com (see "Network Endpoints" section)~/.openclaw/memory/ima_prefs.json and log filesConfiguration allowed:
export IMA_API_KEY=ima_your_key_hereIMA_API_KEY to agent's environment configuration/dev/nullData control:
cat ~/.openclaw/memory/ima_prefs.jsonrm ~/.openclaw/memory/ima_prefs.json (resets to defaults)rm -rf ~/.openclaw/logs/ima_skills/ (auto-cleanup after 7 days anyway)If you need to modify this skill for your use case:
Note: Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.
Actions that could compromise security:
Why this matters:
What this skill does with your data:
| Data Type | Sent to IMA? | Stored Locally? | User Control |
|---|---|---|---|
| Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) |
| API key | ✅ Yes (authentication header) | ❌ No | Set via env var |
| user_id (optional CLI arg) | ❌ Never (local preference key only) | ✅ Yes (as prefs file key) | Change --user-id value |
| Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime |
| Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days |
Privacy recommendations:
--user-id is never sent to IMA servers - it's only used locally as a key for storing preferences in ~/.openclaw/memory/ima_prefs.jsonscripts/ima_create.py to verify network calls (search for create_task function)Get your IMA API key: Visit https://imastudio.com to register and get started.
Version control:
File checksums (optional):
# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py
If users report issues, verify file integrity first.
User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.
~/.openclaw/memory/ima_prefs.jsonSingle file, shared across all IMA skills:
{
"user_{user_id}": {
"text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
}
}
Step 1: Get knowledge-ai recommendation (if installed)
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5"
Step 2: Check user preference
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...}
Step 3: Decide which model to use
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
Step 4: Check for mismatch (for later hint)
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
✅ Save preference when user explicitly specifies a model:
| User says | Action |
|---|---|
用XXX / 换成XXX / 改用XXX | Switch to model XXX + save as preference |
以后都用XXX / 默认用XXX / always use XXX | Save + confirm: ✅ 已记住!以后图片生成默认用 [XXX] |
我喜欢XXX / 我更喜欢XXX | Save as preference |
❌ Do NOT save when:
🗑️ Clear preference when user wants automatic selection:
| User says | Action |
|---|---|
用最好的 / 用最合适的 / best / recommended | Clear pref + use knowledge-ai recommendation |
推荐一个 / 你选一个 / 自动选择 | Clear pref + use knowledge-ai recommendation |
用默认的 / 用新的 | Clear pref + use knowledge-ai recommendation |
试试别的 / 换个试试 (without specific model) | Clear pref + use knowledge-ai recommendation |
重新推荐 | Clear pref + use knowledge-ai recommendation |
Implementation:
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
Selection flow:
Important notes:
The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
When using user preference for image generation, show a line like:
🎨 根据你的使用习惯,将用 [Model Name] 帮你生成…
• 模型:[Model Name](你的常用模型)
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
When user switches to a different model than their saved preference:
💡 你之前喜欢用 [Old Model],这次换成了 [New Model]。
要把 [New Model] 设为以后的默认吗?
回复「是」保存 / 回复「否」仅本次使用
These are fallback defaults — only used when no user preference exists.
Always default to the newest and most popular model. Do NOT default to the cheapest.
| Task Type | Default Model | model_id | version_id | Cost | Why |
|---|---|---|---|---|---|
| text_to_image | SeeDream 4.5 | doubao-seedream-4.5 | doubao-seedream-4-5-251128 | 5 pts | Latest doubao flagship, photorealistic 4K |
| text_to_image (budget) | Nano Banana2 | gemini-3.1-flash-image | gemini-3.1-flash-image | 4 pts | Fastest and cheapest option |
| text_to_image (premium) | Nano Banana Pro | gemini-3-pro-image | gemini-3-pro-image-preview | 10/10/18 pts | Premium quality, 1K/2K/4K options |
| text_to_image (artistic) | Midjourney 🎨 | midjourney | v6 | 8/10 pts | Artist-level aesthetics, creative styles |
| image_to_image | SeeDream 4.5 | doubao-seedream-4.5 | doubao-seedream-4-5-251128 | 5 pts | Latest, best i2i quality |
| image_to_image (budget) | Nano Banana2 | gemini-3.1-flash-image | gemini-3.1-flash-image | 4 pts | Cheapest option |
| image_to_image (premium) | Nano Banana Pro | gemini-3-pro-image | gemini-3-pro-image-preview | 10 pts | Premium quality |
| image_to_image (artistic) | Midjourney 🎨 | midjourney | v6 | 8/10 pts | Artist-level aesthetics, style transfer |
| text_to_video | Wan 2.6 | wan2.6-t2v | wan2.6-t2v | 25 pts | 🔥 Most popular t2v, balanced cost |
| text_to_video (premium) | Hailuo 2.3 | MiniMax-Hailuo-2.3 | MiniMax-Hailuo-2.3 | 38 pts | Higher quality |
| text_to_video (budget) | Vidu Q2 | viduq2 | viduq2 | 5 pts | Lowest cost t2v |
| image_to_video | Wan 2.6 | wan2.6-i2v | wan2.6-i2v | 25 pts | 🔥 Most popular i2v, 1080P |
| image_to_video (premium) | Kling 2.6 | kling-v2-6 | kling-v2-6 | 40-160 pts | Premium Kling i2v |
| first_last_frame_to_video | Kling O1 | kling-video-o1 | kling-video-o1 | 48 pts | Newest Kling reasoning model |
| reference_image_to_video | Kling O1 | kling-video-o1 | kling-video-o1 | 48 pts | Best reference fidelity |
| text_to_music | Suno (sonic-v4) | sonic | sonic | 25 pts | Latest Suno engine, best quality |
| text_to_speech | (query product list) | — | — | — | Run --task-type text_to_speech --list-models; use first or user-preferred model_id |
Premium options:
Quick selection guide (production as of 2026-02-27, sorted by popularity):
Selection guide by use case:
Image Generation:
Video Generation:
Music Generation:
Speech (TTS) Generation:
text_to_speech. Always query GET /open/v1/product/list?category=text_to_speech (or --list-models) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document.⚠️ Technical Note for Suno:
model_versioninsideparameters.parameters(e.g.,"sonic-v5") is different from the outermodel_versionfield (which is"sonic"). Always set both correctly when creating Suno tasks.
⚠️ Production Image Models (4 available):
doubao-seedream-4.5) — 5 pts, defaultmidjourney) — 8/10 pts for 480p/720p, artistic stylesgemini-3.1-flash-image) — 4/6/10/13 pts for 512px/1K/2K/4Kgemini-3-pro-image) — 10/10/18 pts for 1K/2K/4KAll other image models mentioned in older documentation are no longer available in production.
🌟 Parameter Support Notes (All Task Types):
🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!
aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELYaspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELYaspect_ratio support details:
attribute_ids, 4-13 pts)attribute_ids, 10-18 pts)attribute_id, 8/10 pts)When user requests unsupported combinations for images:
❌ Midjourney 暂不支持自定义 aspect_ratio(仅支持 1024x1024 方形)
✅ 推荐方案:
1. SeeDream 4.5(支持虚拟参数 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9
• 成本:5 积分(性价比最佳)
2. Nano Banana Pro/2(原生支持 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4
• 成本:4-18 积分(按尺寸)
需要我帮你用 SeeDream 4.5 生成吗?
form_config)form_config)Auto-Inference Logic for Pixverse V5.5/V5/V4:
model field in form_config from Product List APImodel parameter (e.g., "v5.5", "v5", "v4")model_name and injects it
model_name: "Pixverse V5.5" → auto-inject model: "v5.5"model_name: "Pixverse V4" → auto-inject model: "v4"model in form_config (no auto-inference needed)Error Prevention:
err_code=400017 err_msg=Invalid value for modelSuno sonic-v5 (Full-Featured):
DouBao BGM/Song (Simplified):
🎵 Suno Prompt Writing Guide (for gpt_description_prompt):
When using Suno, structure your prompt with these elements:
Genre/Style:
"lo-fi hip hop", "orchestral cinematic", "upbeat pop", "dark ambient", "indie folk", "electronic dance"Tempo/BPM:
"80 BPM", "fast tempo", "slow ballad", "moderate pace 110 BPM"Vocals Control:
"no vocals" → set make_instrumental=true"female vocals" → set vocal_gender="female""male vocals" → set vocal_gender="male"vocal_gender="mixed"Mood/Emotion:
"happy and energetic", "melancholic", "tense and dramatic", "peaceful and calming"Negative Tags (exclude styles):
negative_tags: "heavy metal, distortion, screaming" to exclude unwanted elementsDuration Hint:
"60 seconds", "30 second loop", "2 minute track"Example Suno prompts:
"upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill"
→ Set: make_instrumental=true
"emotional pop ballad, slow tempo, female vocals, melancholic"
→ Set: vocal_gender="female"
"orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals"
→ Set: make_instrumental=true, tags="orchestral,cinematic,epic"
"acoustic indie folk, gentle guitar, male vocals, warm and nostalgic"
→ Set: vocal_gender="male", tags="acoustic,indie,folk"
⚠️ Technical Note for Suno:
model_versioninsideparameters.parameters(e.g.,"sonic-v5") is different from the outermodel_versionfield (which is"sonic"). Always set both correctly.
User asks for custom aspect ratio image (e.g. "7:3 landscape")
→ ❌ Image models don't support custom ratios
→ ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频,然后截取首帧作为图片。"
User asks for 8K image
→ ❌ No model supports 8K
→ ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro,18积分)。要使用吗?"
User asks for video with audio
→ Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio
→ ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个?"
User asks for long music (e.g. "5 minute track")
→ ❌ Duration not user-controllable
→ ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。"
User asks for 30s video
→ Check model: Most models max 15s
→ ✅ Solution: "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。"
When user requests unsupported combinations:
Note: Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above.
User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.
~/.openclaw/memory/ima_prefs.json{
"user_{user_id}": {
"text_to_video": { "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"image_to_video": { "model_id": "wan2.6-i2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"first_last_frame_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." },
"reference_image_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }
}
}
Step 1: Get knowledge-ai recommendation (if installed)
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "Wan 2.6"
Step 2: Check user preference
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "kling-video-o1", ...}
Step 3: Decide which model to use
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
Step 4: Check for mismatch (for later hint)
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
✅ Save preference when user explicitly specifies a model:
| User says | Action |
|---|---|
用XXX / 换成XXX / 改用XXX | Switch to model XXX + save as preference |
以后都用XXX / 默认用XXX / always use XXX | Save + confirm: ✅ 已记住!以后视频生成默认用 [XXX] |
我喜欢XXX / 我更喜欢XXX | Save as preference |
❌ Do NOT save when:
🗑️ Clear preference when user wants automatic selection:
| User says | Action |
|---|---|
用最好的 / 用最合适的 / best / recommended | Clear pref + use knowledge-ai recommendation |
推荐一个 / 你选一个 / 自动选择 | Clear pref + use knowledge-ai recommendation |
用默认的 / 用新的 | Clear pref + use knowledge-ai recommendation |
试试别的 / 换个试试 (without specific model) | Clear pref + use knowledge-ai recommendation |
重新推荐 | Clear pref + use knowledge-ai recommendation |
Implementation:
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
Selection flow:
Important notes:
The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
v2.0 Updates (aligned with ima-image-ai v1.3):
- Added Step 0 for correct message ordering (fixes group chat bug)
- Added Step 5 for explicit task completion
- Enhanced Midjourney support with proper timing estimates
- Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done
This skill runs inside IM platforms (Feishu, Discord via OpenClaw).
Generation takes 10 seconds (music) up to 6 minutes (video). Never let users wait in silence.
Always follow all 6 steps below, every single time.
Default to plain-language updates in normal user flows. If users ask for technical details, provide them transparently (script name, endpoints, and key parameters).
In standard progress messages, prioritize: model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.
| Task Type | Model | Estimated Time | Poll Every | Send Progress Every |
|---|---|---|---|---|
| text_to_image | SeeDream 4.5 | 25~60s | 5s | 20s |
| Nano Banana2 💚 | 20~40s | 5s | 15s | |
| Nano Banana Pro | 60~120s | 5s | 30s | |
| Midjourney 🎨 | 40~90s | 8s | 25s | |
| image_to_image | SeeDream 4.5 | 25~60s | 5s | 20s |
| Nano Banana2 💚 | 20~40s | 5s | 15s | |
| Nano Banana Pro | 60~120s | 5s | 30s | |
| Midjourney 🎨 | 40~90s | 8s | 25s | |
| text_to_video | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s |
| SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s | |
| Kling O1, Sora 2 Pro | 180~360s | 8s | 60s | |
| image_to_video | Same ranges as text_to_video | — | 8s | 40s |
| first_last_frame / reference | Kling O1, Veo 3.1 | 180~360s | 8s | 60s |
| text_to_music | DouBao BGM / Song | 10~25s | 5s | 10s |
| Suno (sonic-v5) | 20~45s | 5s | 15s | |
| text_to_speech | (varies by model) | 5~30s | 3s | 10s |
estimated_max_seconds = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1).
⚠️ CRITICAL: This step is essential for correct message ordering in IM platforms (Feishu, Discord).
Before doing anything else, reply to the user with a friendly acknowledgment message using your normal reply (not message tool). This reply will automatically appear FIRST in the conversation.
Example acknowledgment messages:
For images:
好的!来帮你画一只萌萌的猫咪 🐱
收到!马上为你生成一张 16:9 的风景照 🏔️
OK! Starting image generation with SeeDream 4.5 🎨
For videos:
好的!来帮你生成一段视频 🎬
收到!开始用 Wan 2.6 生成视频 🎥
For music:
好的!来帮你创作一首音乐 🎵
Rules:
message toolWhy this matters:
message tool pushes appear in chronological order AFTER your initial replyAfter Step 0 reply, use the message tool to push a notification immediately:
[Emoji] 开始生成 [内容类型],请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
Emoji by content type:
🎨🎬(加注:视频生成需要较长时间,我会定时汇报进度)🎵Cost transparency (new requirement):
Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium.
Poll the task detail API every [Poll Every] seconds per the table.
Send a progress update every [Send Progress Every] seconds.
⏳ 正在生成中… [P]%
已等待 [elapsed]s,预计最长 [max]s
Progress formula:
P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100))
successelapsed > estimated_max: freeze at 95%, append 「快了,稍等一下…」When task status = success:
3.1 Send video player first (IM platforms like Feishu will render inline player):
# Get result URL from script output or task detail API
result = get_task_result(task_id)
video_url = result["medias"][0]["url"]
# Build caption
caption = f"""✅ 视频生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
[视频描述]"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send video with caption (use message tool if available)
message(
action="send",
media=video_url, # ⚠️ Use HTTPS URL directly, NOT local file path
caption=caption
)
Important:
3.2 Then send link as text (for copying/sharing):
# Send link message immediately after video
message(action="send", text=f"🔗 视频链接(可复制分享):\n{video_url}")
⚠️ Critical for video:
result["medias"][0]["cover"]# Build caption
caption = f"""✅ 图片生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:{image_url}"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send image with caption
message(
action="send",
media=image_url,
caption=caption
)
Important:
Send audio file with player:
✅ 音乐生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
• 时长:约 [duration]
[音频URL或直接发送音频文件]
Step 0 — Initial acknowledgment (normal reply)
First reply with a short acknowledgment, e.g.: 好的,正在帮你把这段文字转成语音。 / OK, converting this text to speech.
Step 1 — Pre-generation (message tool)
Push once:
🔊 开始语音合成,请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
Step 2 — Progress
Poll every 2–5s. Every 10–15s send: ⏳ 语音合成中… [P]%,已等待 [elapsed]s,预计最长 [max]s. Cap progress at 95% until API returns success.
Step 3 — Success (message tool)
When resource_status == 1 and status != "failed", send media = medias[0].url and caption:
✅ 语音合成成功!
• 模型:[Model Name]
• 耗时:实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:[url]
Use the URL from the API (do not use local file paths).
Step 4 — Failure (message tool)
On failure, send user-friendly message. TTS error translation (do not expose raw API errors):
| Technical | ✅ Say (CN) | ✅ Say (EN) |
|---|---|---|
| 401 Unauthorized | 密钥无效或未授权,请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai |
| 4008 Insufficient points | 积分不足,请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai |
| Invalid product attribute | 参数配置异常,请稍后重试 | Configuration error, try again later |
| Error 6006 / 6010 | 积分或参数不匹配,请换模型或重试 | Points/params mismatch, try another model |
| resource_status == 2 / status failed | 语音合成失败,建议换模型或缩短文本 | Synthesis failed, try another model or shorter text |
| timeout | 合成超时,请稍后重试 | Timed out, try again later |
| Network error | 网络不稳定,请检查后重试 | Network unstable, check and retry |
| Text too long (TTS) | 文本过长,请缩短后重试 | Text too long, please shorten |
Links: API key — https://www.imaclaw.ai/imaclaw/apikey ;Credits — https://www.imaclaw.ai/imaclaw/subscription
Step 5 — Done
After Step 0–4, no further reply needed. Do not send duplicate confirmations.
When task status = failed or any API/network error, send:
❌ [内容类型]生成失败
• 原因:[natural_language_error_message]
• 建议改用:
- [Alt Model 1]([特点],[N pts])
- [Alt Model 2]([特点],[N pts])
需要我帮你用其他模型重试吗?
⚠️ CRITICAL: Error Message Translation
NEVER show technical error messages to users. Always translate API errors into natural language.
API key & credits: 密钥与积分管理入口为 imaclaw.ai(与 imastudio.com 同属 IMA 平台)。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com).
| Technical Error | ❌ Never Say | ✅ Say Instead (Chinese) | ✅ Say Instead (English) |
|---|---|---|---|
401 Unauthorized 🆕 | Invalid API key / 401 Unauthorized | ❌ API密钥无效或未授权<br>💡 生成新密钥: https://www.imaclaw.ai/imaclaw/apikey | ❌ API key is invalid or unauthorized<br>💡 Generate API Key: https://www.imaclaw.ai/imaclaw/apikey |
4008 Insufficient points 🆕 | Insufficient points / Error 4008 | ❌ 积分不足,无法创建任务<br>💡 购买积分: https://www.imaclaw.ai/imaclaw/subscription | ❌ Insufficient points to create this task<br>💡 Buy Credits: https://www.imaclaw.ai/imaclaw/subscription |
"Invalid product attribute" / "Insufficient points" | Invalid product attribute | 生成参数配置异常,请稍后重试 | Configuration error, please try again later |
Error 6006 (credit mismatch) | Error 6006 | 积分计算异常,系统正在修复 | Points calculation error, system is fixing |
Error 6009 (no matching rule) | Error 6009 | 参数组合不匹配,已自动调整 | Parameter mismatch, auto-adjusted |
Error 6010 (attribute_id mismatch) | Attribute ID does not match | 模型参数不匹配,请尝试其他模型 | Model parameters incompatible, try another model |
error 400 (bad request) | error 400 / Bad request | 请求参数有误,请稍后重试 | Invalid request parameters, please try again |
resource_status == 2 | Resource status 2 / Failed | 生成过程遇到问题,建议换个模型试试 | Generation failed, please try another model |
status == "failed" (no details) | Task failed | 这次生成没成功,要不换个模型试试? | Generation unsuccessful, try a different model? |
timeout | Task timed out / Timeout error | 生成时间过长已超时,建议用更快的模型 | Generation took too long, try a faster model |
| Network error / Connection refused | Connection refused / Network error | 网络连接不稳定,请检查网络后重试 | Network connection unstable, check network and retry |
| Rate limit exceeded | 429 Too Many Requests / Rate limit | 请求过于频繁,请稍等片刻再试 | Too many requests, please wait a moment |
| Prompt moderation (Sora only) | Content policy violation | 提示词包含敏感内容,请修改后重试 | Prompt contains restricted content, please modify |
| Model unavailable | Model not available / 503 Service Unavailable | 当前模型暂时不可用,建议换个模型 | Model temporarily unavailable, try another model |
| Lyrics format error (Suno only) 🎵 | Invalid lyrics format | 歌词格式有误,请调整后重试 | Lyrics format error, adjust and retry |
| Prompt too short/long (Music) 🎵 | Prompt length invalid | 音乐描述过短或过长,请调整到合适长度 (建议20-100字) | Music description too short or long, adjust to appropriate length (20-100 chars recommended) |
| Text too long (TTS) 🔊 | TTS text length | 文本过长,请缩短后重试 | Text too long, please shorten and retry |
Generic fallback (when error is unknown):
生成过程遇到问题,请稍后重试或换个模型试试Generation encountered an issue, please try again or use another modelBest Practices:
auto_lyrics=true)--list-models or shortening text.After sending Step 3 (success) or Step 4 (failure):
Why this step matters:
Exception: If the user explicitly asks "还有别的吗?" or similar, then respond naturally.
🆕 Enhanced Error Handling (v1.0.8):
The Reflection mechanism (3 automatic retries) now provides specific, actionable suggestions for common errors:
All error handling is automatic and transparent — users receive natural language explanations with next steps.
Failure fallback by task type:
| Task Type | Failed Model | First Alt | Second Alt |
|---|---|---|---|
| text_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10-18pts, premium) |
| text_to_image | Nano Banana2 | SeeDream 4.5 (5pts, better quality) | Nano Banana Pro (10-18pts) |
| text_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts, budget) |
| image_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana2 | SeeDream 4.5 (5pts) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts) |
| text_to_video | Kling O1 | Wan 2.6 (25pts) | Vidu Q2 (5pts) |
| text_to_video | Google Veo 3.1 | Kling O1 (48pts) | Sora 2 Pro (122pts) |
| text_to_video | Any | Wan 2.6 (25pts, most popular) | Hailuo 2.0 (5pts) |
| image_to_video | Wan 2.6 | Kling O1 (48pts) | Hailuo 2.0 i2v (25pts) |
| image_to_video | Any | Wan 2.6 (25pts, most popular) | Vidu Q2 Pro (20pts) |
| first_last / reference | Kling O1 | Kling 2.6 (80pts) | Veo 3.1 (70pts+) |
| text_to_music 🎵 | Suno | DouBao BGM (30pts, 背景音乐) | DouBao Song (30pts, 歌曲生成) |
| text_to_music 🎵 | DouBao BGM | DouBao Song (30pts) | Suno (25pts, 功能最强) |
| text_to_music 🎵 | DouBao Song | DouBao BGM (30pts) | Suno (25pts, 功能最强) |
| text_to_speech 🔊 | (any) | Query --list-models for alternatives | Use another model_id from product list |
Music-specific failure guidance:
auto_lyrics=trueTTS-specific failure guidance:
--task-type text_to_speech --list-models and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages.Source: production
GET /open/v1/product/list(2026-02-27). Model count reduced significantly. Always query product list API at runtime.
| Category | Name | model_id | Cost |
|---|---|---|---|
| text_to_image | SeeDream 4.5 🌟 | doubao-seedream-4.5 | 5 pts |
| text_to_image | Midjourney 🎨 | midjourney | 8/10 pts (480p/720p) |
| text_to_image | Nano Banana2 💚 | gemini-3.1-flash-image | 4/6/10/13 pts |
| text_to_image | Nano Banana Pro | gemini-3-pro-image | 10/10/18 pts |
| image_to_image | SeeDream 4.5 🌟 | doubao-seedream-4.5 | 5 pts |
| image_to_image | Midjourney 🎨 | midjourney | 8/10 pts (480p/720p) |
| image_to_image | Nano Banana2 💚 | gemini-3.1-flash-image | 4/6/10/13 pts |
| image_to_image | Nano Banana Pro | gemini-3-pro-image | 10 pts |
Midjourney attribute_ids: 5451/5452 (text_to_image), 5453/5454 (image_to_image)
Nano Banana2 size options: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts)
Nano Banana Pro size options: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i)
⚠️ Critical: Models have varying parameter support. Custom aspect ratios are now supported by multiple models.
| Model | Custom Aspect Ratio | Max Resolution | Size Options | Notes |
|---|---|---|---|---|
| SeeDream 4.5 | ✅ (via virtual params) | 4K (adaptive) | 8 aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts) |
| Nano Banana2 | ✅ Native support 🆕 | 4K (4096×4096) | 512px/1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via attribute_id |
| Nano Banana Pro | ✅ Native support 🆕 | 4K (4096×4096) | 1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via attribute_id |
| Midjourney 🎨 | ❌ (1:1 only) | 1024px (square) | 480p/720p via attribute_id | Fixed 1024x1024, artistic style focus |
Key Capabilities:
attribute_ids| Name | model_id | Typical positioning | task_type support |
|---|---|---|---|
| IMA Video Pro (Sevio 1.0) | ima-pro | Higher quality / consistency | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
| IMA Video Pro Fast (Sevio 1.0-Fast) | ima-pro-fast | Lower latency / faster iteration | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
Notes:
ima-all-ai does not hardcode Sevio availability; it resolves by runtime product/list.--list-models result as final source.| Category | Name | model_id | Cost Range |
|---|---|---|---|
| text_to_video (14) | Wan 2.6 🔥 | wan2.6-t2v | 25-120 pts |
| Hailuo 2.3 | MiniMax-Hailuo-2.3 | 32+ pts | |
| Hailuo 2.0 | MiniMax-Hailuo-02 | 5+ pts | |
| Vidu Q2 | viduq2 | 5-70 pts | |
| SeeDance 1.5 Pro | doubao-seedance-1.5-pro | 20+ pts | |
| Sora 2 Pro | sora-2-pro | 122+ pts | |
| Kling O1 | kling-video-o1 | 48-120 pts | |
| Kling 2.6 | kling-v2-6 | 80+ pts | |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Pixverse V5.5 / V5 / V4.5 / V4 / V3.5 | pixverse | 12-48 pts | |
| image_to_video (14) | Wan 2.6 🔥 | wan2.6-i2v | 25-120 pts |
| Hailuo 2.3 / 2.0 | MiniMax-Hailuo-2.3/02 | 25-32 pts | |
| Vidu Q2 Pro | viduq2-pro | 20-70 pts | |
| SeeDance 1.5 Pro | doubao-seedance-1.5-pro | 47+ pts | |
| Sora 2 Pro | sora-2-pro | 122+ pts | |
| Kling O1 / 2.6 | kling-video-o1/v2-6 | 48-120 pts | |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Pixverse V5.5-V3.5 | pixverse | 12-48 pts | |
| first_last_frame (11) | Kling O1 🌟 | kling-video-o1 | 48-120 pts |
| Kling 2.6 | kling-v2-6 | 80+ pts | |
| Others (9) | Hailuo 2.0, Vidu Q2 Pro, SeeDance 1.5 Pro, Veo 3.1, Pixverse V5.5-V3.5 | — | |
| reference_image (6) | Kling O1 🌟 | kling-video-o1 | 48-120 pts |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Others (4) | Vidu Q2, Pixverse V5.5/V5/V4.5 | — |
| text_to_video | SeeDance 1.5 Pro / 1.0 Pro | doubao-seedance-1.5-pro / doubao-seedance-1.0-pro | 16 / 15 pts |
| text_to_video | Sora 2 Pro / Sora 2 | sora-2-pro / sora-2 | 120 / 35 pts |
| text_to_video | Kling O1 / 2.6 / 2.5 Turbo / 1.6 | kling-video-o1 / kling-v2-6 / kling-v2-5-turbo / kling-v1-6 | 48 / 80 / 24 / 32 pts |
| text_to_video | Google Veo 3.1 Fast / 3.1 / 3.0 | veo-3.1-fast-generate-preview / veo-3.1-generate-preview / veo-3.0-generate-preview | 55 / 140 / 280 pts |
| text_to_video | Pixverse V3.5–V5.5 | pixverse | 12 pts |
| image_to_video | Wan 2.6 / 2.6 Flash / 2.5 / 2.2 Plus | wan2.6-i2v / wan2.6-i2v-flash / wan2.5-i2v-preview / wan2.2-i2v-plus | 25 / 12 / 12 / 10 pts |
| image_to_video | Kling 2.1 Master | kling-v2-1-master | 150 pts |
| first_last_frame_to_video | Kling O1 | kling-video-o1 | 70 pts |
| reference_image_to_video | Kling O1 / Vidu Q2 / Q1 | kling-video-o1 / viduq2 / viduq1 | 48 / 10 / 25 pts |
| Category | Name | model_id | Cost | Notes |
|---|---|---|---|---|
| text_to_music | Suno | sonic | 25 pts | sonic-v5; custom_mode, lyrics, vocal_gender |
| text_to_music | DouBao BGM | GenBGM | 30 pts | Background music |
| text_to_music | DouBao Song | GenSong | 30 pts | Song generation |
Models and credits are not fixed. Always call GET /open/v1/product/list?category=text_to_speech (or run the script with --task-type text_to_speech --list-models) to get current model_id, attribute_id, and credit.
ima-all-ai has complete TTS capability: This document and the bundled ima_create.py provide full TTS support (routing, parameters, create/poll, UX protocol Steps 0–5, error translation). The ima-tts-ai skill is an optional standalone package with the same specification.
Poll POST /open/v1/tasks/detail until completion. For TTS, medias[] uses the same structure as other IMA audio tasks:
| Field | Type | Meaning |
|---|---|---|
resource_status | int or null | 0=处理中, 1=可用, 2=失败, 3=已删除;null 视为 0 |
status | string | "pending" / "processing" / "success" / "failed" |
url | string | Audio URL when resource_status=1 (mp3/wav) |
duration_str | string | Optional, e.g. "12s" |
format | string | Optional, e.g. "mp3", "wav" |
Success example: When all medias have resource_status == 1 and status != "failed", read medias[0].url (or watermark_url). Example: {"medias":[{"resource_status":1,"status":"success","url":"https://cdn.../output.mp3","duration_str":"12s","format":"mp3"}]}.
task_type: "text_to_speech". No image input: src_img_url: [], input_images: []. prompt (text to speak) must be inside parameters[].parameters, not at top level. Extra fields (e.g. voice_id, speed) come from product form_config; pass via --extra-params and only include params present in the product’s credit_rules/form_config.
| Mistake | Fix |
|---|---|
| prompt at top level | Put prompt inside parameters[].parameters (script does this) |
| Wrong or missing attribute_id | Always call product list first; use credit_rules |
| Single poll | Poll until all medias have resource_status == 1 |
| Ignoring status when resource_status=1 | Check status != "failed" |
| Sending params not in form_config/credit_rules | Use only params from product list; script reflection strips others on retry |
Always call
GET /open/v1/product/list?category=<type>first to get the liveattribute_idandform_configdefaults required for task creation.
There are two equivalent route systems serving the same backend logic:
| Route | Auth | Use Case |
|---|---|---|
/open/v1/ | Authorization: Bearer ima_* only | Third-party / agent access |
/api/v3/ | Token + API Key (dual auth) | Frontend App |
This skill documents the /open/v1/ Open API. All business logic (credit validation, N-flattening, risk control) runs identically on both paths.
Base URL: https://api.imastudio.com
Required/recommended headers for all /open/v1/ endpoints:
| Header | Required | Value | Notes |
|---|---|---|---|
Authorization | ✅ | Bearer ima_your_api_key_here | API key authentication |
x-app-source | ✅ | ima_skills | Fixed value — identifies skill-originated requests |
x_app_language | recommended | en / zh | Product label language; defaults to en if omitted |
Authorization: Bearer ima_your_api_key_here
x-app-source: ima_skills
x_app_language: en
The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.
| Task Type | Input Required? | Upload Before Create? | Notes |
|---|---|---|---|
| text_to_image | ❌ No | — | Prompt only |
| image_to_image | ✅ Yes (1 image) | ✅ Upload first | Single input image |
| text_to_video | ❌ No | — | Prompt only |
| image_to_video | ✅ Yes (1 image) | ✅ Upload first | Single input image |
| first_last_frame_to_video | ✅ Yes (2 images) | ✅ Upload first | First + last frame |
| reference_image_to_video | ✅ Yes (1+ images) | ✅ Upload first | Reference image(s) |
| text_to_music | ❌ No | — | Prompt only |
| text_to_speech | ❌ No | — | Prompt only (text to speak) |
Upload flow:
prepare_image_url() (see section below)fdl) as the value for input_images / src_img_urlExample workflow (image_to_image):
# User provides local file
image_url = prepare_image_url("/path/to/photo.jpg", api_key)
# → Returns: https://ima-ga.esxscloud.com/webAgent/privite/2026/02/27/..._uuid.jpeg
# Then create task with this URL
create_task(
task_type="image_to_image",
input_images=[image_url], # Use uploaded URL
prompt="turn into oil painting"
)
CRITICAL: You MUST call
/open/v1/product/listBEFORE creating any task.
Theattribute_idfield is REQUIRED in the create request. If it is0or missing, you get:
"Invalid product attribute"→"Insufficient points"→ task fails completely.
NEVER construct a create request from the model table alone. Always fetch the product first.
# Query product list with the correct category
GET /open/v1/product/list?app=ima&platform=web&category=<task_type>
# task_type: text_to_image | image_to_image | text_to_video | image_to_video |
# first_last_frame_to_video | reference_image_to_video | text_to_music | text_to_speech
# Walk the V2 tree to find your target model (type=3 leaf nodes only)
for group in response["data"]:
for version in group.get("children", []):
if version["type"] == "3" and version["model_id"] == target_model_id:
attribute_id = version["credit_rules"][0]["attribute_id"]
credit = version["credit_rules"][0]["points"]
model_version = version["id"] # = version_id / model_version
model_name = version["name"]
form_defaults = {f["field"]: f["value"] for f in version["form_config"]}
break
Pre-queried values for convenience. Always call the product list at runtime for accuracy.
| Model | Task Type | model_id | attribute_id | credit | Notes |
|---|---|---|---|---|---|
| text_to_image | |||||
| SeeDream 4.5 | text_to_image | doubao-seedream-4.5 | 2341 | 5 pts | Default, balanced |
| Nano Banana Pro (1K) | text_to_image | gemini-3-pro-image | 2399 | 10 pts | 1024×1024 |
| Nano Banana Pro (2K) | text_to_image | gemini-3-pro-image | 2400 | 10 pts | 2048×2048 |
| Nano Banana Pro (4K) | text_to_image | gemini-3-pro-image | 2401 | 18 pts | 4096×4096 |
| text_to_video | |||||
| Wan 2.6 (720P, 5s) | text_to_video | wan2.6-t2v | 2057 | 25 pts | Default, balanced |
| Wan 2.6 (1080P, 5s) | text_to_video | wan2.6-t2v | 2058 | 40 pts | — |
| Wan 2.6 (720P, 10s) | text_to_video | wan2.6-t2v | 2059 | 50 pts | — |
| Wan 2.6 (1080P, 10s) | text_to_video | wan2.6-t2v | 2060 | 80 pts | — |
| Wan 2.6 (720P, 15s) | text_to_video | wan2.6-t2v | 2061 | 75 pts | — |
| Wan 2.6 (1080P, 15s) | text_to_video | wan2.6-t2v | 2062 | 120 pts | — |
| Kling O1 (5s, std) | text_to_video | kling-video-o1 | 2313 | 48 pts | Latest Kling |
| Kling O1 (5s, pro) | text_to_video | kling-video-o1 | 2314 | 60 pts | — |
| Kling O1 (10s, std) | text_to_video | kling-video-o1 | 2315 | 96 pts | — |
| Kling O1 (10s, pro) | text_to_video | kling-video-o1 | 2316 | 120 pts | — |
| text_to_music | |||||
| Suno (sonic-v4) | text_to_music | sonic | 2370 | 25 pts | Default |
| DouBao BGM | text_to_music | GenBGM | 4399 | 30 pts | — |
| DouBao Song | text_to_music | GenSong | 4398 | 30 pts | — |
| All others | any | — | → query /open/v1/product/list | — | Always runtime query |
⚠️ Production warning: attribute_id and credit values change frequently in production. Always call /open/v1/product/list at runtime; above table is pre-queried reference only (2026-02-27).
| Mistake | Error |
|---|---|
attribute_id is 0 or missing | "Invalid product attribute" + "Insufficient points" |
attribute_id outdated (production changed) | Same errors; always query product list first |
attribute_id doesn't match parameter combination | Error 6010: "Attribute ID does not match the calculated rule" |
prompt at outer parameters[] level | Prompt ignored; wrong routing |
cast missing from inner parameters.parameters | Billing validation failure |
credit value wrong or missing | Error 6006 |
model_name / model_version missing | Wrong backend routing |
| Skipped product list, used table values directly | All of the above |
⚠️ Critical for Google Veo 3.1 and multi-rule models:
Models like Google Veo 3.1 have multiple credit_rules, each with a different attribute_id for different parameter combinations:
720p + 4s + optimized → attribute_id A720p + 8s + optimized → attribute_id B4K + 4s + high → attribute_id CThe script automatically selects the correct attribute_id by matching your parameters (duration, resolution, compression_quality, generate_audio) against each rule's attributes. If the match fails, you get error 6010.
Fix: The bundled script now checks these video-specific parameters for smart credit_rule selection. Always use the script, not manual API construction.
1. GET /open/v1/product/list?app=ima&platform=web&category=<type>
→ REQUIRED: Get attribute_id, credit, model_version, model_name, form_config defaults
[If input image required]
2. Upload image → get public HTTPS URL
→ See "Image Upload" section below
3. POST /open/v1/tasks/create
→ Must include: attribute_id, model_name, model_version, credit, cast, prompt (nested!)
4. POST /open/v1/tasks/detail {"task_id": "..."}
→ Poll until medias[].resource_status == 1
→ Extract url from completed media
The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.
When a user provides an image (local file, bytes, base64), you must upload it first and get a URL. This is exactly what the IMA frontend does before every image task.
The frontend uses a two-step presigned URL flow via the IM platform:
Step 1: GET /api/rest/oss/getuploadtoken → returns { ful, fdl }
ful = presigned PUT URL (upload destination, expires ~7 days)
fdl = final CDN download URL (use this as input_images value)
Step 2: PUT {ful} with raw image bytes + Content-Type header
→ image is stored in Aliyun OSS: zhubite-imagent-bot.oss-us-east-1.aliyuncs.com
→ accessible via CDN: https://ima-ga.esxscloud.com/...
GET https://imapi.liveme.com/api/rest/oss/getuploadtoken
Required query parameters (11 total — sourced directly from frontend generateUploadInfo):
| Parameter | Example | Description |
|---|---|---|
appUid | ima_xxx... | Use IMA API key directly — no separate login needed |
appId | webAgent | App identifier (fixed) |
appKey | 32jdskjdk320eew | App secret (fixed, used for sign generation) |
cmimToken | ima_xxx... | Use IMA API key directly — same as appUid |
sign | 117CF6CF... | IM auth HMAC: SHA1("webAgent|32jdskjdk320eew|{timestamp}|{nonce}").upper() |
timestamp | 1772042430 | Unix timestamp (seconds), generated per request |
nonce | CxI1FLI5ajLJZ1jlxZmeg | Random nonce string, generated per request |
fService | privite | Fixed: storage service type |
fType | picture | picture for images, video, audio |
fSuffix | jpeg | File extension: jpeg, png, mp4, mp3 |
fContentType | image/jpeg | MIME type of the file |
简化认证:直接使用 IMA API key 填充
appUid和cmimToken参数,无需单独获取凭证。
Response:
{
"ful": "https://zhubite-imagent-bot.oss-us-east-1.aliyuncs.com/webAgent/privite/2026/02/26/..._uuid.jpeg?Expires=...&OSSAccessKeyId=...&Signature=...",
"fdl": "https://ima-ga.esxscloud.com/webAgent/privite/2026/02/26/..._uuid.jpeg",
"ful_expire": "...",
"fdl_expire": "...",
"fdl_key": "..."
}
PUT {ful}
Content-Type: image/jpeg
Body: [raw image bytes]
No auth headers needed — the presigned URL already encodes the credentials.
fdl as the Image URLAfter the PUT succeeds, use fdl (the CDN URL) as the value for input_images / src_img_url.
import hashlib, time, uuid, requests, mimetypes
# ── 🌐 IMA Upload Service Endpoint (IMA-owned, for image/video uploads) ──────
IMA_IM_BASE = "https://imapi.liveme.com"
# ── 🔑 Hardcoded APP_KEY (Public, Shared Across All Users) ──────────────────
# This APP_KEY is a PUBLIC identifier used by IMA Studio's image/video upload
# service. It is NOT a secret—it's intentionally shared across all users and
# embedded in the IMA web frontend. This key is used to generate HMAC signatures
# for upload token requests, but your IMA API key (ima_xxx...) is the ACTUAL
# authentication credential. Think of APP_KEY as a "client ID" rather than a
# "client secret."
#
# ⚠️ Security Note: Your ima_xxx... API key is the sensitive credential. It is
# sent to imapi.liveme.com as query parameters (appUid, cmimToken). Always use
# test keys for experiments and rotate your API key regularly.
#
# 📖 See SECURITY.md for complete disclosure and network verification guide.
APP_ID = "webAgent"
APP_KEY = "32jdskjdk320eew" # Public shared key (used for HMAC sign generation)
APP_UID = "<your_app_uid>" # POST /api/v3/login/app → data.user_id
APP_TOKEN = "<your_app_token>" # POST /api/v3/login/app → data.token
def _gen_sign() -> tuple[str, str, str]:
"""Generate per-request (sign, timestamp, nonce)."""
nonce = uuid.uuid4().hex[:21]
ts = str(int(time.time()))
raw = f"{APP_ID}|{APP_KEY}|{ts}|{nonce}"
sign = hashlib.sha1(raw.encode()).hexdigest().upper()
return sign, ts, nonce
def get_upload_token(app_uid: str, app_token: str,
suffix: str, content_type: str) -> dict:
"""Step 1: Get presigned upload URL from IMA's upload service.
Calls GET imapi.liveme.com/api/rest/oss/getuploadtoken with exactly 11 params.
Returns: { "ful": "<presigned PUT URL>", "fdl": "<CDN download URL>" }
Args:
app_uid: Your IMA API key (ima_xxx...), used as appUid parameter
app_token: Your IMA API key (ima_xxx...), used as cmimToken parameter
suffix: File extension (jpeg, png, mp4, mp3)
content_type: MIME type (image/jpeg, video/mp4, etc.)
Security Note:
Your IMA API key (ima_xxx...) is sent to imapi.liveme.com as query
parameters (appUid, cmimToken). This is IMA Studio's image/video upload
service, separate from the main api.imastudio.com API. Both domains are
owned by IMA Studio—this is part of IMA's microservices architecture.
Why two domains?
- api.imastudio.com: Core AI generation API (product list, task creation)
- imapi.liveme.com: Specialized upload service (presigned URL generation)
Your API key grants access to both services. For security verification,
see SECURITY.md section "Network Traffic Verification."
"""
sign, ts, nonce = _gen_sign()
r = requests.get(
f"{IMA_IM_BASE}/api/rest/oss/getuploadtoken",
params={
# App Key params
"appUid": app_uid, # APP_UID
"appId": APP_ID,
"appKey": APP_KEY,
"cmimToken": app_token, # APP_TOKEN
"sign": sign,
"timestamp": ts,
"nonce": nonce,
# File params
"fService": "privite", # fixed
"fType": "picture", # picture / video / audio
"fSuffix": suffix, # jpeg / png / mp4 / mp3
"fContentType": content_type,
},
)
r.raise_for_status()
return r.json()["data"]
def upload_image_to_oss(image_bytes: bytes, content_type: str, ful: str) -> None:
"""Step 2: PUT image bytes to the presigned OSS URL. No auth needed."""
resp = requests.put(ful, data=image_bytes, headers={"Content-Type": content_type})
resp.raise_for_status()
def prepare_image_url(source, api_key: str) -> str:
"""
Full workflow: upload any image and return the CDN URL (fdl).
Args:
source: file path (str), raw bytes, or already-public HTTPS URL
api_key: IMA API key for upload authentication
Returns: public HTTPS CDN URL ready to use as input_images value
"""
# Already a public URL → use directly, no upload needed
if isinstance(source, str) and source.startswith("https://"):
return source
# Read file bytes
if isinstance(source, str):
ext = source.rsplit(".", 1)[-1].lower() if "." in source else "jpeg"
with open(source, "rb") as f:
image_bytes = f.read()
content_type = mimetypes.guess_type(source)[0] or "image/jpeg"
else:
image_bytes = source
ext = "jpeg"
content_type = "image/jpeg"
# Step 1: Get presigned URL using API key directly
token_data = get_upload_token(api_key, ext, content_type)
ful = token_data["ful"]
fdl = token_data["fdl"]
# Step 2: Upload to OSS
upload_image_to_oss(image_bytes, content_type, ful)
# Step 3: Return CDN URL
return fdl # use this as input_images / src_img_url value
OSS path format:
webAgent/privite/{YYYY}/{MM}/{DD}/{timestamp}_{uid}_{uuid}.{ext}CDN base:https://ima-ga.esxscloud.com/OSS bucket:zhubite-imagent-bot.oss-us-east-1.aliyuncs.com
| category | Capability | Input |
|---|---|---|
text_to_image | Text → Image | prompt |
image_to_image | Image → Image | prompt + input_images |
text_to_video | Text → Video | prompt |
image_to_video | Image → Video | prompt + input_images |
first_last_frame_to_video | First+Last Frame → Video | prompt + src_img_url[2] |
reference_image_to_video | Reference Image → Video | prompt + src_img_url[1+] |
text_to_music | Text → Music | prompt |
text_to_speech | Text → Speech | prompt (text to speak) |
Each media in medias[] has two fields:
| Field | Type | Values | Description |
|---|---|---|---|
resource_status | int (or null) | 0, 1, 2, 3 | 0=处理中, 1=可用, 2=失败, 3=已删除。API 可能返回 null,需当作 0。 |
status | string | "pending", "processing", "success", "failed" | 任务状态文案。轮询时以 resource_status 为准;status == "failed" 表示失败。 |
Poll on resource_status first, then ensure status is not "failed":
resource_status | status | Meaning | Action |
|---|---|---|---|
0 or null | pending / processing | 处理中 | Keep polling; do not stop (null = 0) |
1 | success (or completed) | 完成 | Read url; stop only when all medias are 1 |
1 | failed | 失败 (status 优先) | Stop, handle error |
2 | any | 失败 | Stop, handle error |
3 | any | 已删除 | Stop |
Important: (1) Treat
resource_status: nullas 0. (2) Stop only when all medias haveresource_status == 1. (3) Whenresource_status=1, still checkstatus != "failed".
GET /open/v1/product/list?app=ima&platform=web&category=text_to_image
Internally calls downstream /v1/products/listv2. Returns a V2 tree structure: type=2 nodes are model groups, type=3 nodes are versions (leaves). Only type=3 nodes contain credit_rules and form_config.
webAgentis auto-converted toimaby the gateway — you can use either value forapp.
[
{
"id": "SeeDream",
"type": "2",
"name": "SeeDream",
"model_id": "",
"children": [
{
"id": "doubao-seedream-4-0-250828",
"type": "3",
"name": "SeeDream 4.0",
"model_id": "doubao-seedream-4.0",
"credit_rules": [
{ "attribute_id": 332, "points": 5, "attributes": { "default": "enabled" } }
],
"form_config": [
{ "field": "size", "type": "tags", "value": "1K",
"options": [{"label":"1K","value":"1K"}, {"label":"2K","value":"2K"}] }
]
}
]
}
]
How to pick a version for task creation:
type=3 leaves (versions)model_id and id (= model_version) from the leafcredit_rules[].attribute_id matching your desired quality/size (attributes field shows the config)form_config[].value as default parameters values
credit_rules[].attribute_id→ required for task creation asattribute_id.credit_rules[].points→ required for task creation ascreditandcast.points.
POST /open/v1/tasks/create
{
"task_type": "text_to_image",
"enable_multi_model": false,
"src_img_url": [],
"upload_img_src": "",
"parameters": [
{
"attribute_id": 8538,
"model_id": "doubao-seedream-4.5",
"model_name": "SeeDream 4.5",
"model_version": "doubao-seedream-4-5-251128",
"app": "ima",
"platform": "web",
"category": "text_to_image",
"credit": 5,
"parameters": {
"prompt": "a beautiful mountain sunset, photorealistic",
"size": "4k",
"n": 1,
"input_images": [],
"cast": {"points": 5, "attribute_id": 8538}
}
}
]
}
| Field | Required | Description |
|---|---|---|
task_type | ✅ | Must match parameters[].category |
parameters[].attribute_id | ✅ | From credit_rules[].attribute_id in product list |
parameters[].model_id | ✅ | From type=3 leaf node model_id |
parameters[].model_version | ✅ | From type=3 leaf node id |
parameters[].app | ✅ | Use ima (or webAgent, auto-converted) |
parameters[].platform | ✅ | Use web |
parameters[].category | ✅ | Must match top-level task_type |
parameters[].credit | ✅ | Must equal credit_rules[].points. Error 6006 if wrong. |
parameters[].parameters.prompt | ✅ | The actual prompt text used by downstream service |
parameters[].parameters.cast | ✅ | {"points": N, "attribute_id": N} — mirrors credit |
parameters[].parameters.n | ✅ | Number of outputs (usually 1). Gateway flattens N>1 into separate resources. |
parameters[].parameters.input_images | image tasks | Array of input image URLs |
top-level src_img_url | multi-image | Array for first_last_frame / reference tasks |
When n > 1, the gateway automatically:
n independent resourceBizId valuesn times (one per resource)n separate tasks in the downstream serviceResponse medias[] will contain n items. Poll until all have resource_status == 1.
{
"code": 0,
"data": {
"id": "task_abc123",
"biz_id": "biz_xxx",
"task_type": "text_to_image",
"medias": [],
"generate_count": 1,
"created_at": 1700000000000,
"timeout_at": 1700000300000
}
}
data.id = task ID for polling. timeout_at = Unix ms deadline.
POST /open/v1/tasks/detail
{"task_id": "<id from create response>"}
Poll every 2–5s (8s+ for video). Completed response:
{
"id": "task_abc",
"medias": [{
"resource_status": 1,
"status": "success",
"url": "https://cdn.../output.jpg",
"cover": "https://cdn.../cover.jpg",
"format": "jpg",
"width": 1024,
"height": 1024
}]
}
Polling stop condition (must implement exactly):
resource_status: null (or missing) as 0 (processing). Do not stop when you see null; backend may serialize Go *int as null.medias[].resource_status == 1 and no status == "failed". If you return on the first media with resource_status == 1 while others are still 0, the task is not fully done and you will keep polling or get inconsistent state.status == "failed" or resource_status == 2 or resource_status == 3.No image input. src_img_url: [], input_images: []. See API 2 for full example.
Extra fields vs text_to_image — all from form_config defaults:
{
"task_type": "text_to_video",
"src_img_url": [],
"parameters": [{
"attribute_id": 4838,
"model_id": "wan2.6-t2v",
"model_name": "Wan 2.6",
"model_version": "wan2.6-t2v",
"category": "text_to_video",
"credit": 3,
"app": "ima", "platform": "web",
"parameters": {
"prompt": "a puppy dancing happily, sunny meadow",
"negative_prompt": "",
"prompt_extend": false,
"duration": 5,
"resolution": "1080P",
"aspect_ratio": "16:9",
"shot_type": "single",
"seed": -1,
"n": 1,
"input_images": [],
"cast": {"points": 3, "attribute_id": 4838}
}
}]
}
Video-specific fields from
form_config:duration(seconds),resolution,aspect_ratio,shot_type,negative_prompt,prompt_extend. Poll every 8s (video generation is slower). Responsemedias[].cover= first-frame thumbnail.
No image input. src_img_url: [], input_images: [].
{
"task_type": "image_to_image",
"src_img_url": ["https://...input.jpg"],
"parameters": [{
"attribute_id": 8560,
"model_id": "doubao-seedream-4.5",
"model_version": "doubao-seedream-4-5-251128",
"category": "image_to_image",
"credit": 5,
"app": "ima", "platform": "web",
"parameters": {
"prompt": "turn into oil painting style",
"size": "4k",
"n": 1,
"input_images": ["https://...input.jpg"],
"cast": {"points": 5, "attribute_id": 8560}
}
}]
}
⚠️
sizemust be fromform_configoptions (e.g."2k","4k","2048x2048")."adaptive"is NOT valid for SeeDream 4.5 i2i — causes error 400. Top-levelsrc_img_urlandparameters.input_imagesmust both contain the input image URL. Some i2i models (e.g.doubao-seededit-3.0-i2i) may not be available in test environments — fall back to SeeDream 4.5.
{
"src_img_url": ["https://first-frame.jpg", "https://last-frame.jpg"]
}
Index 0 = first frame (or reference), index 1 = last frame (first_last_frame only).
| Mistake | Fix |
|---|---|
attribute_id not from credit_rules | Always fetch product list first |
credit value wrong | Must exactly match credit_rules[].points — error 6006 |
prompt at wrong location | Put prompt in parameters[].parameters.prompt (nested), not only at top level |
Polling biz_id instead of id | Use id (task ID) for /tasks/detail |
| Single-poll instead of loop | Poll until resource_status == 1 for ALL medias |
Missing app / platform in parameters | Required fields — use ima / web |
category mismatch | parameters[].category must match top-level task_type |
resource_status == 2 not handled | Check for failure, don't loop forever |
status == "failed" ignored | resource_status=1 + status="failed" means actual failure |
n > 1 and only checking first media | All n media items must reach resource_status == 1 |
See the Python example sections throughout this documentation for implementation guidance covering all 7 task types.