Install
openclaw skills install ima-all-aiMost comprehensive AI content creation platform with unified access to all leading models across images (SeeDream 4.5, Midjourney, Nano Banana 2, Nano Banana...
openclaw skills install ima-all-aiCRITICAL: When calling the script, you MUST use the exact model_id (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ nano-banana-pro is WRONG; ✅ gemini-3-pro-image is CORRECT).
Quick Reference Table:
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Nano Banana2 | gemini-3.1-flash-image | ❌ NOT nano-banana-2, 预算选择 4-13 pts |
| Nano Banana Pro | gemini-3-pro-image | ❌ NOT nano-banana-pro, 高质量 10-18 pts |
| SeeDream 4.5 | doubao-seedream-4.5 | ✅ Recommended default, 5 pts |
| Midjourney | midjourney | ✅ Same as friendly name, 8-10 pts |
| 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) |
|---|---|---|---|
| Wan 2.6 | wan2.6-t2v | wan2.6-i2v | ⚠️ Note -t2v/-i2v suffix |
| IMA Video Pro (Sevio 1.0) | ima-pro | ima-pro | ✅ IMA native quality model |
| IMA Video Pro Fast (Sevio 1.0-Fast) | ima-pro-fast | ima-pro-fast | ✅ IMA native low-latency model |
| Kling O1 | kling-video-o1 | kling-video-o1 | ⚠️ Note video- prefix |
| Kling 2.6 | kling-v2-6 | kling-v2-6 | ⚠️ Note v prefix |
| Hailuo 2.3 | MiniMax-Hailuo-2.3 | MiniMax-Hailuo-2.3 | ⚠️ Note MiniMax- prefix |
| Hailuo 2.0 | MiniMax-Hailuo-02 | MiniMax-Hailuo-02 | ⚠️ Note 02 not 2.0 |
| Google Veo 3.1 | veo-3.1-generate-preview | veo-3.1-generate-preview | ⚠️ Note -generate-preview suffix |
| Sora 2 Pro | sora-2-pro | sora-2-pro | ✅ Straightforward |
| Pixverse | pixverse | pixverse | ✅ Same as friendly name |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Suno (sonic v4) | sonic | ⚠️ Simplified to sonic |
| DouBao BGM | GenBGM | ❌ NOT doubao-bgm |
| DouBao Song | GenSong | ❌ NOT doubao-song |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| seed-tts-2.0 | seed-tts-2.0 | ✅ Same as friendly name (default) |
How to get the correct model_id:
--list-models --task-type <type> to query available modelsRuntime truth source:
GET /open/v1/product/list(or--list-models).
Any table in this document is guidance; actual availability depends on current product list.
Example:
# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro
# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image
This skill is fully runnable as a standalone package.
If ima-knowledge-ai is installed, the agent may read its references for workflow decomposition and consistency guidance.
Recommended optional reads:
Check for workflow complexity — Read ima-knowledge-ai/references/workflow-design.md if:
Check for visual consistency needs — Read ima-knowledge-ai/references/visual-consistency.md if:
Check video modes — Read ima-knowledge-ai/references/video-modes.md if:
Check model selection — Read ima-knowledge-ai/references/model-selection.md if:
Why this matters:
Example multi-media workflow:
User: "帮我做个产品宣传MV,有背景音乐,主角是旺财小狗"
❌ Wrong:
1. Generate dog image (random look)
2. Generate video (different dog)
3. Generate music (unrelated)
✅ Right:
1. Read workflow-design.md + visual-consistency.md
2. Generate Master Reference: 旺财小狗图片
3. Generate video shots using image_to_video with 旺财 as first frame
4. Get video duration (e.g., 15s)
5. Generate BGM with matching duration and mood
How to check:
# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)
# Step 1: Read knowledge base based on task type
if multi_media_workflow:
read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")
if "same subject" or "series" or "character":
read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")
if video_generation:
read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")
# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)
No exceptions — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first.
Purpose: So that any agent parses user intent consistently, first determine the media type from the user's request, then choose task_type and model.
| User intent / keywords | Media type | task_type examples |
|---|---|---|
| 画 / 生成图 / 图片 / image / 画一张 / 图生图 | image | text_to_image, image_to_image |
| 视频 / 生成视频 / video / 图生视频 / 文生视频 | video | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
| 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | music | text_to_music |
| 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | speech | text_to_speech |
If the request mixes media (e.g. "宣传片+配乐"), treat as multi-media workflow: read workflow-design.md, then plan image → video → music steps and use the correct task_type for each step.
Image: For model name → model_id and size/aspect_ratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).
Video: For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow ima-video-ai skill (User Input Parsing section).
Sevio alias normalization in ima-all-ai:
Ima Sevio 1.0 → ima-proIma Sevio 1.0-Fast / Ima Sevio 1.0 Fast → ima-pro-fast
Routing rule:task_type--list-modelsMusic: Suno (sonic) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id sonic, GenBGM, GenSong per "Recommended Defaults" and "Music Generation" tables below.
Speech (TTS): Get model_id from GET /open/v1/product/list?category=text_to_speech or run script with --task-type text_to_speech --list-models. Map user intent to parameters using product form_config:
| User intent / phrasing | Parameter (if in form_config) | Notes |
|---|---|---|
| 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options |
| 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options |
| 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | volume | If supported |
If the user does not specify, use form_config defaults. Pass extra params via --extra-params '{"speed":1.0}'. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).
For transparency: This skill uses a bundled Python script (scripts/ima_create.py) to call the IMA Open API. The script:
--user-id only locally as a key for storing your model preferencesWhat gets sent to IMA servers:
What's stored locally:
~/.openclaw/memory/ima_prefs.json - Your model preferences (< 1 KB)~/.openclaw/logs/ima_skills/ - Generation logs (auto-deleted after 7 days)| Domain | Owner | Purpose | Data Sent | Privacy |
|---|---|---|---|---|
api.imastudio.com | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, your API key | Standard HTTPS, data processed for AI generation |
imapi.liveme.com | IMA Studio | Image/Video upload service (presigned URL generation) | Your API key, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only |
*.aliyuncs.com, *.esxscloud.com | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, NO API key) | IMA-managed OSS buckets, presigned URLs expire after 7 days |
Key Points:
text_to_music) and TTS tasks (text_to_speech) only use api.imastudio.com.imapi.liveme.com to obtain presigned URLs for uploading input images.api.imastudio.com and imapi.liveme.com (both owned by IMA Studio).tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'. See this document: 🌐 Network Endpoints Used and ⚠️ Credential Security Notice for full disclosure.Your API key is sent to both IMA-owned domains:
Authorization: Bearer ima_xxx... → api.imastudio.com (main API)appUid=ima_xxx... → imapi.liveme.com (upload service)Security best practices:
https://imastudio.com/dashboard for unauthorized activity.~/.openclaw/logs/ima_skills/ for unexpected API calls.Why two domains? IMA Studio uses a microservices architecture:
api.imastudio.com: Core AI generation APIimapi.liveme.com: Specialized image/video upload service (shared infrastructure)Both domains are operated by IMA Studio. The same API key grants access to both services.
Note for users: You can review the script source at
scripts/ima_create.pyanytime.
The agent uses this script to simplify API calls. Music tasks use onlyapi.imastudio.com, while image/video tasks also callimapi.liveme.comfor file uploads (see "Network Endpoints" above).
Use the bundled script internally for all task types — it ensures correct parameter construction:
# ─── Image Generation ──────────────────────────────────────────────────────────
# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
--user-id {user_id} --output-json
# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
--size 2k --user-id {user_id} --output-json
# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_image \
--model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# ─── Video Generation ──────────────────────────────────────────────────────────
# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
--user-id {user_id} --output-json
# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
--extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
--user-id {user_id} --output-json
# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_video \
--model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type first_last_frame_to_video \
--model-id kling-video-o1 --prompt "smooth transition between frames" \
--input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
--user-id {user_id} --output-json
# ─── Music Generation ──────────────────────────────────────────────────────────
# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
--user-id {user_id} --output-json
# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "pop ballad, emotional" \
--extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
--user-id {user_id} --output-json
# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id GenBGM --prompt "relaxing ambient music for meditation" \
--user-id {user_id} --output-json
# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────
# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech --list-models
# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech \
--model-id <model_id from list> --prompt "Text to be spoken here." \
--user-id {user_id} --output-json
The script outputs JSON with url, model_name, credit — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.
Call IMA Open API to create AI-generated content. All endpoints require an ima_* API key. The core flow is: query products → create task → poll until done.
This skill is community-maintained and open for inspection.
Full transparency:
scripts/ima_create.py and ima_logger.py anytimeapi.imastudio.com only; image/video tasks also use imapi.liveme.com (see "Network Endpoints" section)~/.openclaw/memory/ima_prefs.json and log filesConfiguration allowed:
export IMA_API_KEY=ima_your_key_hereIMA_API_KEY to agent's environment configuration/dev/nullData control:
cat ~/.openclaw/memory/ima_prefs.jsonrm ~/.openclaw/memory/ima_prefs.json (resets to defaults)rm -rf ~/.openclaw/logs/ima_skills/ (auto-cleanup after 7 days anyway)If you need to modify this skill for your use case:
Note: Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.
Actions that could compromise security:
Why this matters:
What this skill does with your data:
| Data Type | Sent to IMA? | Stored Locally? | User Control |
|---|---|---|---|
| Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) |
| API key | ✅ Yes (authentication header) | ❌ No | Set via env var |
| user_id (optional CLI arg) | ❌ Never (local preference key only) | ✅ Yes (as prefs file key) | Change --user-id value |
| Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime |
| Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days |
Privacy recommendations:
--user-id is never sent to IMA servers - it's only used locally as a key for storing preferences in ~/.openclaw/memory/ima_prefs.jsonscripts/ima_create.py to verify network calls (search for create_task function)Get your IMA API key: Visit https://imastudio.com to register and get started.
Version control:
File checksums (optional):
# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py
If users report issues, verify file integrity first.
User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.
~/.openclaw/memory/ima_prefs.jsonSingle file, shared across all IMA skills:
{
"user_{user_id}": {
"text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
}
}
Step 1: Get knowledge-ai recommendation (if installed)
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5"
Step 2: Check user preference
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...}
Step 3: Decide which model to use
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
Step 4: Check for mismatch (for later hint)
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
✅ Save preference when user explicitly specifies a model:
| User says | Action |
|---|---|
用XXX / 换成XXX / 改用XXX | Switch to model XXX + save as preference |
以后都用XXX / 默认用XXX / always use XXX | Save + confirm: ✅ 已记住!以后图片生成默认用 [XXX] |
我喜欢XXX / 我更喜欢XXX | Save as preference |
❌ Do NOT save when:
🗑️ Clear preference when user wants automatic selection:
| User says | Action |
|---|---|
用最好的 / 用最合适的 / best / recommended | Clear pref + use knowledge-ai recommendation |
推荐一个 / 你选一个 / 自动选择 | Clear pref + use knowledge-ai recommendation |
用默认的 / 用新的 | Clear pref + use knowledge-ai recommendation |
试试别的 / 换个试试 (without specific model) | Clear pref + use knowledge-ai recommendation |
重新推荐 | Clear pref + use knowledge-ai recommendation |
Implementation:
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
Selection flow:
Important notes:
The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
When using user preference for image generation, show a line like:
🎨 根据你的使用习惯,将用 [Model Name] 帮你生成…
• 模型:[Model Name](你的常用模型)
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
When user switches to a different model than their saved preference:
💡 你之前喜欢用 [Old Model],这次换成了 [New Model]。
要把 [New Model] 设为以后的默认吗?
回复「是」保存 / 回复「否」仅本次使用
These are fallback defaults — only used when no user preference exists.
Always default to the newest and most popular model. Do NOT default to the cheapest.
| Task Type | Default Model | model_id | version_id | Cost | Why |
|---|---|---|---|---|---|
| text_to_image | SeeDream 4.5 | doubao-seedream-4.5 | doubao-seedream-4-5-251128 | 5 pts | Latest doubao flagship, photorealistic 4K |
| text_to_image (budget) | Nano Banana2 | gemini-3.1-flash-image | gemini-3.1-flash-image | 4 pts | Fastest and cheapest option |
| text_to_image (premium) | Nano Banana Pro | gemini-3-pro-image | gemini-3-pro-image-preview | 10/10/18 pts | Premium quality, 1K/2K/4K options |
| text_to_image (artistic) | Midjourney 🎨 | midjourney | v6 | 8/10 pts | Artist-level aesthetics, creative styles |
| image_to_image | SeeDream 4.5 | doubao-seedream-4.5 | doubao-seedream-4-5-251128 | 5 pts | Latest, best i2i quality |
| image_to_image (budget) | Nano Banana2 | gemini-3.1-flash-image | gemini-3.1-flash-image | 4 pts | Cheapest option |
| image_to_image (premium) | Nano Banana Pro | gemini-3-pro-image | gemini-3-pro-image-preview | 10 pts | Premium quality |
| image_to_image (artistic) | Midjourney 🎨 | midjourney | v6 | 8/10 pts | Artist-level aesthetics, style transfer |
| text_to_video | Wan 2.6 | wan2.6-t2v | wan2.6-t2v | 25 pts | 🔥 Most popular t2v, balanced cost |
| text_to_video (premium) | Hailuo 2.3 | MiniMax-Hailuo-2.3 | MiniMax-Hailuo-2.3 | 38 pts | Higher quality |
| text_to_video (budget) | Vidu Q2 | viduq2 | viduq2 | 5 pts | Lowest cost t2v |
| image_to_video | Wan 2.6 | wan2.6-i2v | wan2.6-i2v | 25 pts | 🔥 Most popular i2v, 1080P |
| image_to_video (premium) | Kling 2.6 | kling-v2-6 | kling-v2-6 | 40-160 pts | Premium Kling i2v |
| first_last_frame_to_video | Kling O1 | kling-video-o1 | kling-video-o1 | 48 pts | Newest Kling reasoning model |
| reference_image_to_video | Kling O1 | kling-video-o1 | kling-video-o1 | 48 pts | Best reference fidelity |
| text_to_music | Suno (sonic-v4) | sonic | sonic | 25 pts | Latest Suno engine, best quality |
| text_to_speech | (query product list) | — | — | — | Run --task-type text_to_speech --list-models; use first or user-preferred model_id |
Premium options:
Quick selection guide (production as of 2026-02-27, sorted by popularity):
Selection guide by use case:
Image Generation:
Video Generation:
Music Generation:
Speech (TTS) Generation:
text_to_speech. Always query GET /open/v1/product/list?category=text_to_speech (or --list-models) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document.⚠️ Technical Note for Suno:
model_versioninsideparameters.parameters(e.g.,"sonic-v5") is different from the outermodel_versionfield (which is"sonic"). Always set both correctly when creating Suno tasks.
⚠️ Production Image Models (4 available):
doubao-seedream-4.5) — 5 pts, defaultmidjourney) — 8/10 pts for 480p/720p, artistic stylesgemini-3.1-flash-image) — 4/6/10/13 pts for 512px/1K/2K/4Kgemini-3-pro-image) — 10/10/18 pts for 1K/2K/4KAll other image models mentioned in older documentation are no longer available in production.
🌟 Parameter Support Notes (All Task Types):
🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!
aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELYaspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELYaspect_ratio support details:
attribute_ids, 4-13 pts)attribute_ids, 10-18 pts)attribute_id, 8/10 pts)When user requests unsupported combinations for images:
❌ Midjourney 暂不支持自定义 aspect_ratio(仅支持 1024x1024 方形)
✅ 推荐方案:
1. SeeDream 4.5(支持虚拟参数 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9
• 成本:5 积分(性价比最佳)
2. Nano Banana Pro/2(原生支持 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4
• 成本:4-18 积分(按尺寸)
需要我帮你用 SeeDream 4.5 生成吗?
form_config)form_config)Auto-Inference Logic for Pixverse V5.5/V5/V4:
model field in form_config from Product List APImodel parameter (e.g., "v5.5", "v5", "v4")model_name and injects it
model_name: "Pixverse V5.5" → auto-inject model: "v5.5"model_name: "Pixverse V4" → auto-inject model: "v4"model in form_config (no auto-inference needed)Error Prevention:
err_code=400017 err_msg=Invalid value for modelSuno sonic-v5 (Full-Featured):
DouBao BGM/Song (Simplified):
🎵 Suno Prompt Writing Guide (for gpt_description_prompt):
When using Suno, structure your prompt with these elements:
Genre/Style:
"lo-fi hip hop", "orchestral cinematic", "upbeat pop", "dark ambient", "indie folk", "electronic dance"Tempo/BPM:
"80 BPM", "fast tempo", "slow ballad", "moderate pace 110 BPM"Vocals Control:
"no vocals" → set make_instrumental=true"female vocals" → set vocal_gender="female""male vocals" → set vocal_gender="male"vocal_gender="mixed"Mood/Emotion:
"happy and energetic", "melancholic", "tense and dramatic", "peaceful and calming"Negative Tags (exclude styles):
negative_tags: "heavy metal, distortion, screaming" to exclude unwanted elementsDuration Hint:
"60 seconds", "30 second loop", "2 minute track"Example Suno prompts:
"upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill"
→ Set: make_instrumental=true
"emotional pop ballad, slow tempo, female vocals, melancholic"
→ Set: vocal_gender="female"
"orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals"
→ Set: make_instrumental=true, tags="orchestral,cinematic,epic"
"acoustic indie folk, gentle guitar, male vocals, warm and nostalgic"
→ Set: vocal_gender="male", tags="acoustic,indie,folk"
⚠️ Technical Note for Suno:
model_versioninsideparameters.parameters(e.g.,"sonic-v5") is different from the outermodel_versionfield (which is"sonic"). Always set both correctly.
User asks for custom aspect ratio image (e.g. "7:3 landscape")
→ ❌ Image models don't support custom ratios
→ ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频,然后截取首帧作为图片。"
User asks for 8K image
→ ❌ No model supports 8K
→ ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro,18积分)。要使用吗?"
User asks for video with audio
→ Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio
→ ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个?"
User asks for long music (e.g. "5 minute track")
→ ❌ Duration not user-controllable
→ ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。"
User asks for 30s video
→ Check model: Most models max 15s
→ ✅ Solution: "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。"
When user requests unsupported combinations:
Note: Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above.
User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.
~/.openclaw/memory/ima_prefs.json{
"user_{user_id}": {
"text_to_video": { "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"image_to_video": { "model_id": "wan2.6-i2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"first_last_frame_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." },
"reference_image_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }
}
}
Step 1: Get knowledge-ai recommendation (if installed)
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "Wan 2.6"
Step 2: Check user preference
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "kling-video-o1", ...}
Step 3: Decide which model to use
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
Step 4: Check for mismatch (for later hint)
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
✅ Save preference when user explicitly specifies a model:
| User says | Action |
|---|---|
用XXX / 换成XXX / 改用XXX | Switch to model XXX + save as preference |
以后都用XXX / 默认用XXX / always use XXX | Save + confirm: ✅ 已记住!以后视频生成默认用 [XXX] |
我喜欢XXX / 我更喜欢XXX | Save as preference |
❌ Do NOT save when:
🗑️ Clear preference when user wants automatic selection:
| User says | Action |
|---|---|
用最好的 / 用最合适的 / best / recommended | Clear pref + use knowledge-ai recommendation |
推荐一个 / 你选一个 / 自动选择 | Clear pref + use knowledge-ai recommendation |
用默认的 / 用新的 | Clear pref + use knowledge-ai recommendation |
试试别的 / 换个试试 (without specific model) | Clear pref + use knowledge-ai recommendation |
重新推荐 | Clear pref + use knowledge-ai recommendation |
Implementation:
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
Selection flow:
Important notes:
The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
v2.0 Updates (aligned with ima-image-ai v1.3):
- Added Step 0 for correct message ordering (fixes group chat bug)
- Added Step 5 for explicit task completion
- Enhanced Midjourney support with proper timing estimates
- Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done
This skill runs inside IM platforms (Feishu, Discord via OpenClaw).
Generation takes 10 seconds (music) up to 6 minutes (video). Never let users wait in silence.
Always follow all 6 steps below, every single time.
Default to plain-language updates in normal user flows. If users ask for technical details, provide them transparently (script name, endpoints, and key parameters).
In standard progress messages, prioritize: model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.
| Task Type | Model | Estimated Time | Poll Every | Send Progress Every |
|---|---|---|---|---|
| text_to_image | SeeDream 4.5 | 25~60s | 5s | 20s |
| Nano Banana2 💚 | 20~40s | 5s | 15s | |
| Nano Banana Pro | 60~120s | 5s | 30s | |
| Midjourney 🎨 | 40~90s | 8s | 25s | |
| image_to_image | SeeDream 4.5 | 25~60s | 5s | 20s |
| Nano Banana2 💚 | 20~40s | 5s | 15s | |
| Nano Banana Pro | 60~120s | 5s | 30s | |
| Midjourney 🎨 | 40~90s | 8s | 25s | |
| text_to_video | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s |
| SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s | |
| Kling O1, Sora 2 Pro | 180~360s | 8s | 60s | |
| image_to_video | Same ranges as text_to_video | — | 8s | 40s |
| first_last_frame / reference | Kling O1, Veo 3.1 | 180~360s | 8s | 60s |
| text_to_music | DouBao BGM / Song | 10~25s | 5s | 10s |
| Suno (sonic-v5) | 20~45s | 5s | 15s | |
| text_to_speech | (varies by model) | 5~30s | 3s | 10s |
estimated_max_seconds = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1).
⚠️ CRITICAL: This step is essential for correct message ordering in IM platforms (Feishu, Discord).
Before doing anything else, reply to the user with a friendly acknowledgment message using your normal reply (not message tool). This reply will automatically appear FIRST in the conversation.
Example acknowledgment messages:
For images:
好的!来帮你画一只萌萌的猫咪 🐱
收到!马上为你生成一张 16:9 的风景照 🏔️
OK! Starting image generation with SeeDream 4.5 🎨
For videos:
好的!来帮你生成一段视频 🎬
收到!开始用 Wan 2.6 生成视频 🎥
For music:
好的!来帮你创作一首音乐 🎵
Rules:
message toolWhy this matters:
message tool pushes appear in chronological order AFTER your initial replyAfter Step 0 reply, use the message tool to push a notification immediately:
[Emoji] 开始生成 [内容类型],请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
Emoji by content type:
🎨🎬(加注:视频生成需要较长时间,我会定时汇报进度)🎵Cost transparency (new requirement):
Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium.
Poll the task detail API every [Poll Every] seconds per the table.
Send a progress update every [Send Progress Every] seconds.
⏳ 正在生成中… [P]%
已等待 [elapsed]s,预计最长 [max]s
Progress formula:
P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100))
successelapsed > estimated_max: freeze at 95%, append 「快了,稍等一下…」When task status = success:
3.1 Send video player first (IM platforms like Feishu will render inline player):
# Get result URL from script output or task detail API
result = get_task_result(task_id)
video_url = result["medias"][0]["url"]
# Build caption
caption = f"""✅ 视频生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
[视频描述]"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send video with caption (use message tool if available)
message(
action="send",
media=video_url, # ⚠️ Use HTTPS URL directly, NOT local file path
caption=caption
)
Important:
3.2 Then send link as text (for copying/sharing):
# Send link message immediately after video
message(action="send", text=f"🔗 视频链接(可复制分享):\n{video_url}")
⚠️ Critical for video:
result["medias"][0]["cover"]# Build caption
caption = f"""✅ 图片生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:{image_url}"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send image with caption
message(
action="send",
media=image_url,
caption=caption
)
Important:
Send audio file with player:
✅ 音乐生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
• 时长:约 [duration]
[音频URL或直接发送音频文件]
Step 0 — Initial acknowledgment (normal reply)
First reply with a short acknowledgment, e.g.: 好的,正在帮你把这段文字转成语音。 / OK, converting this text to speech.
Step 1 — Pre-generation (message tool)
Push once:
🔊 开始语音合成,请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
Step 2 — Progress
Poll every 2–5s. Every 10–15s send: ⏳ 语音合成中… [P]%,已等待 [elapsed]s,预计最长 [max]s. Cap progress at 95% until API returns success.
Step 3 — Success (message tool)
When resource_status == 1 and status != "failed", send media = medias[0].url and caption:
✅ 语音合成成功!
• 模型:[Model Name]
• 耗时:实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:[url]
Use the URL from the API (do not use local file paths).
Step 4 — Failure (message tool)
On failure, send user-friendly message. TTS error translation (do not expose raw API errors):
| Technical | ✅ Say (CN) | ✅ Say (EN) |
|---|---|---|
| 401 Unauthorized | 密钥无效或未授权,请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai |
| 4008 Insufficient points | 积分不足,请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai |
| Invalid product attribute | 参数配置异常,请稍后重试 | Configuration error, try again later |
| Error 6006 / 6010 | 积分或参数不匹配,请换模型或重试 | Points/params mismatch, try another model |
| resource_status == 2 / status failed | 语音合成失败,建议换模型或缩短文本 | Synthesis failed, try another model or shorter text |
| timeout | 合成超时,请稍后重试 | Timed out, try again later |
| Network error | 网络不稳定,请检查后重试 | Network unstable, check and retry |
| Text too long (TTS) | 文本过长,请缩短后重试 | Text too long, please shorten |
Links: API key — https://www.imaclaw.ai/imaclaw/apikey ;Credits — https://www.imaclaw.ai/imaclaw/subscription
Step 5 — Done
After Step 0–4, no further reply needed. Do not send duplicate confirmations.
When task status = failed or any API/network error, send:
❌ [内容类型]生成失败
• 原因:[natural_language_error_message]
• 建议改用:
- [Alt Model 1]([特点],[N pts])
- [Alt Model 2]([特点],[N pts])
需要我帮你用其他模型重试吗?
⚠️ CRITICAL: Error Message Translation
NEVER show technical error messages to users. Always translate API errors into natural language.
API key & credits: 密钥与积分管理入口为 imaclaw.ai(与 imastudio.com 同属 IMA 平台)。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com).
| Technical Error | ❌ Never Say | ✅ Say Instead (Chinese) | ✅ Say Instead (English) |
|---|---|---|---|
401 Unauthorized 🆕 | Invalid API key / 401 Unauthorized | ❌ API密钥无效或未授权<br>💡 生成新密钥: https://www.imaclaw.ai/imaclaw/apikey | ❌ API key is invalid or unauthorized<br>💡 Generate API Key: https://www.imaclaw.ai/imaclaw/apikey |
4008 Insufficient points 🆕 | Insufficient points / Error 4008 | ❌ 积分不足,无法创建任务<br>💡 购买积分: https://www.imaclaw.ai/imaclaw/subscription | ❌ Insufficient points to create this task<br>💡 Buy Credits: https://www.imaclaw.ai/imaclaw/subscription |
"Invalid product attribute" / "Insufficient points" | Invalid product attribute | 生成参数配置异常,请稍后重试 | Configuration error, please try again later |
Error 6006 (credit mismatch) | Error 6006 | 积分计算异常,系统正在修复 | Points calculation error, system is fixing |
Error 6009 (no matching rule) | Error 6009 | 参数组合不匹配,已自动调整 | Parameter mismatch, auto-adjusted |
Error 6010 (attribute_id mismatch) | Attribute ID does not match | 模型参数不匹配,请尝试其他模型 | Model parameters incompatible, try another model |
error 400 (bad request) | error 400 / Bad request | 请求参数有误,请稍后重试 | Invalid request parameters, please try again |
resource_status == 2 | Resource status 2 / Failed | 生成过程遇到问题,建议换个模型试试 | Generation failed, please try another model |
status == "failed" (no details) | Task failed | 这次生成没成功,要不换个模型试试? | Generation unsuccessful, try a different model? |
timeout | Task timed out / Timeout error | 生成时间过长已超时,建议用更快的模型 | Generation took too long, try a faster model |
| Network error / Connection refused | Connection refused / Network error | 网络连接不稳定,请检查网络后重试 | Network connection unstable, check network and retry |
| Rate limit exceeded | 429 Too Many Requests / Rate limit | 请求过于频繁,请稍等片刻再试 | Too many requests, please wait a moment |
| Prompt moderation (Sora only) | Content policy violation | 提示词包含敏感内容,请修改后重试 | Prompt contains restricted content, please modify |
| Model unavailable | Model not available / 503 Service Unavailable | 当前模型暂时不可用,建议换个模型 | Model temporarily unavailable, try another model |
| Lyrics format error (Suno only) 🎵 | Invalid lyrics format | 歌词格式有误,请调整后重试 | Lyrics format error, adjust and retry |
| Prompt too short/long (Music) 🎵 | Prompt length invalid | 音乐描述过短或过长,请调整到合适长度 (建议20-100字) | Music description too short or long, adjust to appropriate length (20-100 chars recommended) |
| Text too long (TTS) 🔊 | TTS text length | 文本过长,请缩短后重试 | Text too long, please shorten and retry |
Generic fallback (when error is unknown):
生成过程遇到问题,请稍后重试或换个模型试试Generation encountered an issue, please try again or use another modelBest Practices:
auto_lyrics=true)--list-models or shortening text.After sending Step 3 (success) or Step 4 (failure):
Why this step matters:
Exception: If the user explicitly asks "还有别的吗?" or similar, then respond naturally.
🆕 Enhanced Error Handling (v1.0.8):
The Reflection mechanism (3 automatic retries) now provides specific, actionable suggestions for common errors:
All error handling is automatic and transparent — users receive natural language explanations with next steps.
Failure fallback by task type:
| Task Type | Failed Model | First Alt | Second Alt |
|---|---|---|---|
| text_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10-18pts, premium) |
| text_to_image | Nano Banana2 | SeeDream 4.5 (5pts, better quality) | Nano Banana Pro (10-18pts) |
| text_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts, budget) |
| image_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana2 | SeeDream 4.5 (5pts) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts) |
| text_to_video | Kling O1 | Wan 2.6 (25pts) | Vidu Q2 (5pts) |
| text_to_video | Google Veo 3.1 | Kling O1 (48pts) | Sora 2 Pro (122pts) |
| text_to_video | Any | Wan 2.6 (25pts, most popular) | Hailuo 2.0 (5pts) |
| image_to_video | Wan 2.6 | Kling O1 (48pts) | Hailuo 2.0 i2v (25pts) |
| image_to_video | Any | Wan 2.6 (25pts, most popular) | Vidu Q2 Pro (20pts) |
| first_last / reference | Kling O1 | Kling 2.6 (80pts) | Veo 3.1 (70pts+) |
| text_to_music 🎵 | Suno | DouBao BGM (30pts, 背景音乐) | DouBao Song (30pts, 歌曲生成) |
| text_to_music 🎵 | DouBao BGM | DouBao Song (30pts) | Suno (25pts, 功能最强) |
| text_to_music 🎵 | DouBao Song | DouBao BGM (30pts) | Suno (25pts, 功能最强) |
| text_to_speech 🔊 | (any) | Query --list-models for alternatives | Use another model_id from product list |
Music-specific failure guidance:
auto_lyrics=trueTTS-specific failure guidance:
--task-type text_to_speech --list-models and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages.Source: production
GET /open/v1/product/list(2026-02-27). Model count reduced significantly. Always query product list API at runtime.
| Category | Name | model_id | Cost |
|---|---|---|---|
| text_to_image | SeeDream 4.5 🌟 | doubao-seedream-4.5 | 5 pts |
| text_to_image | Midjourney 🎨 | midjourney | 8/10 pts (480p/720p) |
| text_to_image | Nano Banana2 💚 | gemini-3.1-flash-image | 4/6/10/13 pts |
| text_to_image | Nano Banana Pro | gemini-3-pro-image | 10/10/18 pts |
| image_to_image | SeeDream 4.5 🌟 | doubao-seedream-4.5 | 5 pts |
| image_to_image | Midjourney 🎨 | midjourney | 8/10 pts (480p/720p) |
| image_to_image | Nano Banana2 💚 | gemini-3.1-flash-image | 4/6/10/13 pts |
| image_to_image | Nano Banana Pro | gemini-3-pro-image | 10 pts |
Midjourney attribute_ids: 5451/5452 (text_to_image), 5453/5454 (image_to_image)
Nano Banana2 size options: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts)
Nano Banana Pro size options: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i)
⚠️ Critical: Models have varying parameter support. Custom aspect ratios are now supported by multiple models.
| Model | Custom Aspect Ratio | Max Resolution | Size Options | Notes |
|---|---|---|---|---|
| SeeDream 4.5 | ✅ (via virtual params) | 4K (adaptive) | 8 aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts) |
| Nano Banana2 | ✅ Native support 🆕 | 4K (4096×4096) | 512px/1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via attribute_id |
| Nano Banana Pro | ✅ Native support 🆕 | 4K (4096×4096) | 1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via attribute_id |
| Midjourney 🎨 | ❌ (1:1 only) | 1024px (square) | 480p/720p via attribute_id | Fixed 1024x1024, artistic style focus |
Key Capabilities:
attribute_ids| Name | model_id | Typical positioning | task_type support |
|---|---|---|---|
| IMA Video Pro (Sevio 1.0) | ima-pro | Higher quality / consistency | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
| IMA Video Pro Fast (Sevio 1.0-Fast) | ima-pro-fast | Lower latency / faster iteration | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
Notes:
ima-all-ai does not hardcode Sevio availability; it resolves by runtime product/list.--list-models result as final source.| Category | Name | model_id | Cost Range |
|---|---|---|---|
| text_to_video (14) | Wan 2.6 🔥 | wan2.6-t2v | 25-120 pts |
| Hailuo 2.3 | MiniMax-Hailuo-2.3 | 32+ pts | |
| Hailuo 2.0 | MiniMax-Hailuo-02 | 5+ pts | |
| Vidu Q2 | viduq2 | 5-70 pts | |
| SeeDance 1.5 Pro | doubao-seedance-1.5-pro | 20+ pts | |
| Sora 2 Pro | sora-2-pro | 122+ pts | |
| Kling O1 | kling-video-o1 | 48-120 pts | |
| Kling 2.6 | kling-v2-6 | 80+ pts | |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Pixverse V5.5 / V5 / V4.5 / V4 / V3.5 | pixverse | 12-48 pts | |
| image_to_video (14) | Wan 2.6 🔥 | wan2.6-i2v | 25-120 pts |
| Hailuo 2.3 / 2.0 | MiniMax-Hailuo-2.3/02 | 25-32 pts | |
| Vidu Q2 Pro | viduq2-pro | 20-70 pts | |
| SeeDance 1.5 Pro | doubao-seedance-1.5-pro | 47+ pts | |
| Sora 2 Pro | sora-2-pro | 122+ pts | |
| Kling O1 / 2.6 | kling-video-o1/v2-6 | 48-120 pts | |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Pixverse V5.5-V3.5 | pixverse | 12-48 pts | |
| first_last_frame (11) | Kling O1 🌟 | kling-video-o1 | 48-120 pts |
| Kling 2.6 | kling-v2-6 | 80+ pts | |
| Others (9) | Hailuo 2.0, Vidu Q2 Pro, SeeDance 1.5 Pro, Veo 3.1, Pixverse V5.5-V3.5 | — | |
| reference_image (6) | Kling O1 🌟 | kling-video-o1 | 48-120 pts |
| Google Veo 3.1 | veo-3.1-generate-preview | 70-330 pts | |
| Others (4) | Vidu Q2, Pixverse V5.5/V5/V4.5 | — |
| text_to_video | SeeDance 1.5 Pro / 1.0 Pro | doubao-seedance-1.5-pro / doubao-seedance-1.0-pro | 16 / 15 pts |
| text_to_video | Sora 2 Pro / Sora 2 | sora-2-pro / sora-2 | 120 / 35 pts |
| text_to_video | Kling O1 / 2.6 / 2.5 Turbo / 1.6 | kling-video-o1 / kling-v2-6 / kling-v2-5-turbo / kling-v1-6 | 48 / 80 / 24 / 32 pts |
| text_to_video | Google Veo 3.1 Fast / 3.1 / 3.0 | veo-3.1-fast-generate-preview / veo-3.1-generate-preview / veo-3.0-generate-preview | 55 / 140 / 280 pts |
| text_to_video | Pixverse V3.5–V5.5 | pixverse | 12 pts |
| image_to_video | Wan 2.6 / 2.6 Flash / 2.5 / 2.2 Plus | wan2.6-i2v / wan2.6-i2v-flash / wan2.5-i2v-preview / wan2.2-i2v-plus | 25 / 12 / 12 / 10 pts |
| image_to_video | Kling 2.1 Master | kling-v2-1-master | 150 pts |
| first_last_frame_to_video | Kling O1 | kling-video-o1 | 70 pts |
| reference_image_to_video | Kling O1 / Vidu Q2 / Q1 | kling-video-o1 / viduq2 / viduq1 | 48 / 10 / 25 pts |
| Category | Name | model_id | Cost | Notes |
|---|---|---|---|---|
| text_to_music | Suno | sonic | 25 pts | sonic-v5; custom_mode, lyrics, vocal_gender |
| text_to_music | DouBao BGM | GenBGM | 30 pts | Background music |
| text_to_music | DouBao Song | GenSong | 30 pts | Song generation |
Models and credits are not fixed. Always call GET /open/v1/product/list?category=text_to_speech (or run the script with --task-type text_to_speech --list-models) to get current model_id, attribute_id, and credit.
ima-all-ai has complete TTS capability: This document and the bundled ima_create.py provide full TTS support (routing, parameters, create/poll, UX protocol Steps 0–5, error translation). The ima-tts-ai skill is an optional standalone package with the same specification.
Poll POST /open/v1/tasks/detail until completion. For TTS, medias[] uses the same structure as other IMA audio tasks:
| Field | Type | Meaning |
|---|---|---|
resource_status | int or null | 0=处理中, 1=可用, 2=失败, 3=已删除;null 视为 0 |
status | string | "pending" / "processing" / "success" / "failed" |
url | string | Audio URL when resource_status=1 (mp3/wav) |
duration_str | string | Optional, e.g. "12s" |
format | string | Optional, e.g. "mp3", "wav" |
Success example: When all medias have resource_status == 1 and status != "failed", read medias[0].url (or watermark_url). Example: {"medias":[{"resource_status":1,"status":"success","url":"https://cdn.../output.mp3","duration_str":"12s","format":"mp3"}]}.
task_type: "text_to_speech". No image input: src_img_url: [], input_images: []. prompt (text to speak) must be inside parameters[].parameters, not at top level. Extra fields (e.g. voice_id, speed) come from product form_config; pass via --extra-params and only include params present in the product’s credit_rules/form_config.
| Mistake | Fix |
|---|---|
| prompt at top level | Put prompt inside parameters[].parameters (script does this) |
| Wrong or missing attribute_id | Always call product list first; use credit_rules |
| Single poll | Poll until all medias have resource_status == 1 |
| Ignoring status when resource_status=1 | Check status != "failed" |
| Sending params not in form_config/credit_rules | Use only params from product list; script reflection strips others on retry |
Always call
GET /open/v1/product/list?category=<type>first to get the liveattribute_idandform_configdefaults required for task creation.
There are two equivalent route systems serving the same backend logic:
| Route | Auth | Use Case |
|---|---|---|
/open/v1/ | Authorization: Bearer ima_* only | Third-party / agent access |
/api/v3/ | Token + API Key (dual auth) | Frontend App |
This skill documents the /open/v1/ Open API. All business logic (credit validation, N-flattening, risk control) runs identically on both paths.
Base URL: https://api.imastudio.com
Required/recommended headers for all /open/v1/ endpoints:
| Header | Required | Value | Notes |
|---|---|---|---|
Authorization | ✅ | Bearer ima_your_api_key_here | API key authentication |
x-app-source | ✅ | ima_skills | Fixed value — identifies skill-originated requests |
x_app_language | recommended | en / zh | Product label language; defaults to en if omitted |
Authorization: Bearer ima_your_api_key_here
x-app-source: ima_skills
x_app_language: en
The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.
| Task Type | Input Required? | Upload Before Create? | Notes |
|---|---|---|---|
| text_to_image | ❌ No | — | Prompt only |
| image_to_image | ✅ Yes (1 image) | ✅ Upload first | Single input image |
| text_to_video | ❌ No | — | Prompt only |
| image_to_video | ✅ Yes (1 image) | ✅ Upload first | Single input image |
| first_last_frame_to_video | ✅ Yes (2 images) | ✅ Upload first | First + last frame |
| reference_image_to_video | ✅ Yes (1+ images) | ✅ Upload first | Reference image(s) |
| text_to_music | ❌ No | — | Prompt only |
| text_to_speech | ❌ No | — | Prompt only (text to speak) |
Upload flow:
prepare_image_url() (see section below)fdl) as the value for input_images / src_img_urlExample workflow (image_to_image):
# User provides local file
image_url = prepare_image_url("/path/to/photo.jpg", api_key)
# → Returns: https://ima-ga.esxscloud.com/webAgent/privite/2026/02/27/..._uuid.jpeg
# Then create task with this URL
create_task(
task_type="image_to_image",
input_images=[image_url], # Use uploaded URL
prompt="turn into oil painting"
)
CRITICAL: You MUST call
/open/v1/product/listBEFORE creating any task.
Theattribute_idfield is REQUIRED in the create request. If it is0or missing, you get:
"Invalid product attribute"→"Insufficient points"→ task fails completely.
NEVER construct a create request from the model table alone. Always fetch the product first.
# Query product list with the correct category
GET /open/v1/product/list?app=ima&platform=web&category=<task_type>
# task_type: text_to_image | image_to_image | text_to_video | image_to_video |
# first_last_frame_to_video | reference_image_to_video | text_to_music | text_to_speech
# Walk the V2 tree to find your target model (type=3 leaf nodes only)
for group in response["data"]:
for version in group.get("children", []):
if version["type"] == "3" and version["model_id"] == target_model_id:
attribute_id = version["credit_rules"][0]["attribute_id"]
credit = version["credit_rules"][0]["points"]
model_version = version["id"] # = version_id / model_version
model_name = version["name"]
form_defaults = {f["field"]: f["value"] for f in version["form_config"]}
break
Pre-queried values for convenience. Always call the product list at runtime for accuracy.
| Model | Task Type | model_id | attribute_id | credit | Notes |
|---|---|---|---|---|---|
| text_to_image | |||||
| SeeDream 4.5 | text_to_image | doubao-seedream-4.5 | 2341 | 5 pts | Default, balanced |
| Nano Banana Pro (1K) | text_to_image | gemini-3-pro-image | 2399 | 10 pts | 1024×1024 |
| Nano Banana Pro (2K) | text_to_image | gemini-3-pro-image | 2400 | 10 pts | 2048×2048 |
| Nano Banana Pro (4K) | text_to_image | gemini-3-pro-image | 2401 | 18 pts | 4096×4096 |
| text_to_video | |||||
| Wan 2.6 (720P, 5s) | text_to_video | wan2.6-t2v | 2057 | 25 pts | Default, balanced |
| Wan 2.6 (1080P, 5s) | text_to_video | wan2.6-t2v | 2058 | 40 pts | — |
| Wan 2.6 (720P, 10s) | text_to_video | wan2.6-t2v | 2059 | 50 pts | — |
| Wan 2.6 (1080P, 10s) | text_to_video | wan2.6-t2v | 2060 | 80 pts | — |
| Wan 2.6 (720P, 15s) | text_to_video | wan2.6-t2v | 2061 | 75 pts | — |
| Wan 2.6 (1080P, 15s) | text_to_video | wan2.6-t2v | 2062 | 120 pts | — |
| Kling O1 (5s, std) | text_to_video | kling-video-o1 | 2313 | 48 pts | Latest Kling |
| Kling O1 (5s, pro) | text_to_video | kling-video-o1 | 2314 | 60 pts | — |
| Kling O1 (10s, std) | text_to_video | kling-video-o1 | 2315 | 96 pts | — |
| Kling O1 (10s, pro) | text_to_video | kling-video-o1 | 2316 | 120 pts | — |
| text_to_music | |||||
| Suno (sonic-v4) | text_to_music | sonic | 2370 | 25 pts | Default |
| DouBao BGM | text_to_music | GenBGM | 4399 | 30 pts | — |
| DouBao Song | text_to_music | GenSong | 4398 | 30 pts | — |
| All others | any | — | → query /open/v1/product/list | — | Always runtime query |
⚠️ Production warning: attribute_id and credit values change frequently in production. Always call /open/v1/product/list at runtime; above table is pre-queried reference only (2026-02-27).
| Mistake | Error |
|---|---|
attribute_id is 0 or missing | "Invalid product attribute" + "Insufficient points" |
attribute_id outdated (production changed) | Same errors; always query product list first |
attribute_id doesn't match parameter combination | Error 6010: "Attribute ID does not match the calculated rule" |
prompt at outer parameters[] level | Prompt ignored; wrong routing |
cast missing from inner parameters.parameters | Billing validation failure |
credit value wrong or missing | Error 6006 |
model_name / model_version missing | Wrong backend routing |
| Skipped product list, used table values directly | All of the above |
⚠️ Critical for Google Veo 3.1 and multi-rule models:
Models like Google Veo 3.1 have multiple credit_rules, each with a different attribute_id for different parameter combinations:
720p + 4s + optimized → attribute_id A720p + 8s + optimized → attribute_id B4K + 4s + high → attribute_id CThe script automatically selects the correct attribute_id by matching your parameters (duration, resolution, compression_quality, generate_audio) against each rule's attributes. If the match fails, you get error 6010.
Fix: The bundled script now checks these video-specific parameters for smart credit_rule selection. Always use the script, not manual API construction.
1. GET /open/v1/product/list?app=ima&platform=web&category=<type>
→ REQUIRED: Get attribute_id, credit, model_version, model_name, form_config defaults
[If input image required]
2. Upload image → get public HTTPS URL
→ See "Image Upload" section below
3. POST /open/v1/tasks/create
→ Must include: attribute_id, model_name, model_version, credit, cast, prompt (nested!)
4. POST /open/v1/tasks/detail {"task_id": "..."}
→ Poll until medias[].resource_status == 1
→ Extract url from completed media
The IMA Open API does NOT accept raw bytes or base64 images. All image inputs must be public HTTPS URLs.
When a user provides an image (local file, bytes, base64), you must upload it first and get a URL. This is exactly what the IMA frontend does before every image task.
The frontend uses a two-step presigned URL flow via the IM platform:
Step 1: GET /api/rest/oss/getuploadtoken → returns { ful, fdl }
ful = presigned PUT URL (upload destination, expires ~7 days)
fdl = final CDN download URL (use this as input_images value)
Step 2: PUT {ful} with raw image bytes + Content-Type header
→ image is stored in Aliyun OSS: zhubite-imagent-bot.oss-us-east-1.aliyuncs.com
→ accessible via CDN: https://ima-ga.esxscloud.com/...
GET https://imapi.liveme.com/api/rest/oss/getuploadtoken
Required query parameters (11 total — sourced directly from frontend generateUploadInfo):
| Parameter | Example | Description |
|---|---|---|
appUid | ima_xxx... | Use IMA API key directly — no separate login needed |
appId | webAgent | App identifier (fixed) |
appKey | 32jdskjdk320eew | App secret (fixed, used for sign generation) |
cmimToken | ima_xxx... | Use IMA API key directly — same as appUid |
sign | 117CF6CF... | IM auth HMAC: SHA1("webAgent|32jdskjdk320eew|{timestamp}|{nonce}").upper() |
timestamp | 1772042430 | Unix timestamp (seconds), generated per request |
nonce | CxI1FLI5ajLJZ1jlxZmeg | Random nonce string, generated per request |
fService | privite | Fixed: storage service type |
fType | picture | picture for images, video, audio |
fSuffix | jpeg | File extension: jpeg, png, mp4, mp3 |
fContentType | image/jpeg | MIME type of the file |
简化认证:直接使用 IMA API key 填充
appUid和cmimToken参数,无需单独获取凭证。
Response:
{
"ful": "https://zhubite-imagent-bot.oss-us-east-1.aliyuncs.com/webAgent/privite/2026/02/26/..._uuid.jpeg?Expires=...&OSSAccessKeyId=...&Signature=...",
"fdl": "https://ima-ga.esxscloud.com/webAgent/privite/2026/02/26/..._uuid.jpeg",
"ful_expire": "...",
"fdl_expire": "...",
"fdl_key": "..."
}
PUT {ful}
Content-Type: image/jpeg
Body: [raw image bytes]
No auth headers needed — the presigned URL already encodes the credentials.
fdl as the Image URLAfter the PUT succeeds, use fdl (the CDN URL) as the value for input_images / src_img_url.
import hashlib, time, uuid, requests, mimetypes
# ── 🌐 IMA Upload Service Endpoint (IMA-owned, for image/video uploads) ──────
IMA_IM_BASE = "https://imapi.liveme.com"
# ── 🔑 Hardcoded APP_KEY (Public, Shared Across All Users) ──────────────────
# This APP_KEY is a PUBLIC identifier used by IMA Studio's image/video upload
# service. It is NOT a secret—it's intentionally shared across all users and
# embedded in the IMA web frontend. This key is used to generate HMAC signatures
# for upload token requests, but your IMA API key (ima_xxx...) is the ACTUAL
# authentication credential. Think of APP_KEY as a "client ID" rather than a
# "client secret."
#
# ⚠️ Security Note: Your ima_xxx... API key is the sensitive credential. It is
# sent to imapi.liveme.com as query parameters (appUid, cmimToken). Always use
# test keys for experiments and rotate your API key regularly.
#
# 📖 See SECURITY.md for complete disclosure and network verification guide.
APP_ID = "webAgent"
APP_KEY = "32jdskjdk320eew" # Public shared key (used for HMAC sign generation)
APP_UID = "<your_app_uid>" # POST /api/v3/login/app → data.user_id
APP_TOKEN = "<your_app_token>" # POST /api/v3/login/app → data.token
def _gen_sign() -> tuple[str, str, str]:
"""Generate per-request (sign, timestamp, nonce)."""
nonce = uuid.uuid4().hex[:21]
ts = str(int(time.time()))
raw = f"{APP_ID}|{APP_KEY}|{ts}|{nonce}"
sign = hashlib.sha1(raw.encode()).hexdigest().upper()
return sign, ts, nonce
def get_upload_token(app_uid: str, app_token: str,
suffix: str, content_type: str) -> dict:
"""Step 1: Get presigned upload URL from IMA's upload service.
Calls GET imapi.liveme.com/api/rest/oss/getuploadtoken with exactly 11 params.
Returns: { "ful": "<presigned PUT URL>", "fdl": "<CDN download URL>" }
Args:
app_uid: Your IMA API key (ima_xxx...), used as appUid parameter
app_token: Your IMA API key (ima_xxx...), used as cmimToken parameter
suffix: File extension (jpeg, png, mp4, mp3)
content_type: MIME type (image/jpeg, video/mp4, etc.)
Security Note:
Your IMA API key (ima_xxx...) is sent to imapi.liveme.com as query
parameters (appUid, cmimToken). This is IMA Studio's image/video upload
service, separate from the main api.imastudio.com API. Both domains are
owned by IMA Studio—this is part of IMA's microservices architecture.
Why two domains?
- api.imastudio.com: Core AI generation API (product list, task creation)
- imapi.liveme.com: Specialized upload service (presigned URL generation)
Your API key grants access to both services. For security verification,
see SECURITY.md section "Network Traffic Verification."
"""
sign, ts, nonce = _gen_sign()
r = requests.get(
f"{IMA_IM_BASE}/api/rest/oss/getuploadtoken",
params={
# App Key params
"appUid": app_uid, # APP_UID
"appId": APP_ID,
"appKey": APP_KEY,
"cmimToken": app_token, # APP_TOKEN
"sign": sign,
"timestamp": ts,
"nonce": nonce,
# File params
"fService": "privite", # fixed
"fType": "picture", # picture / video / audio
"fSuffix": suffix, # jpeg / png / mp4 / mp3
"fContentType": content_type,
},
)
r.raise_for_status()
return r.json()["data"]
def upload_image_to_oss(image_bytes: bytes, content_type: str, ful: str) -> None:
"""Step 2: PUT image bytes to the presigned OSS URL. No auth needed."""
resp = requests.put(ful, data=image_bytes, headers={"Content-Type": content_type})
resp.raise_for_status()
def prepare_image_url(source, api_key: str) -> str:
"""
Full workflow: upload any image and return the CDN URL (fdl).
Args:
source: file path (str), raw bytes, or already-public HTTPS URL
api_key: IMA API key for upload authentication
Returns: public HTTPS CDN URL ready to use as input_images value
"""
# Already a public URL → use directly, no upload needed
if isinstance(source, str) and source.startswith("https://"):
return source
# Read file bytes
if isinstance(source, str):
ext = source.rsplit(".", 1)[-1].lower() if "." in source else "jpeg"
with open(source, "rb") as f:
image_bytes = f.read()
content_type = mimetypes.guess_type(source)[0] or "image/jpeg"
else:
image_bytes = source
ext = "jpeg"
content_type = "image/jpeg"
# Step 1: Get presigned URL using API key directly
token_data = get_upload_token(api_key, ext, content_type)
ful = token_data["ful"]
fdl = token_data["fdl"]
# Step 2: Upload to OSS
upload_image_to_oss(image_bytes, content_type, ful)
# Step 3: Return CDN URL
return fdl # use this as input_images / src_img_url value
OSS path format:
webAgent/privite/{YYYY}/{MM}/{DD}/{timestamp}_{uid}_{uuid}.{ext}CDN base:https://ima-ga.esxscloud.com/OSS bucket:zhubite-imagent-bot.oss-us-east-1.aliyuncs.com
| category | Capability | Input |
|---|---|---|
text_to_image | Text → Image | prompt |
image_to_image | Image → Image | prompt + input_images |
text_to_video | Text → Video | prompt |
image_to_video | Image → Video | prompt + input_images |
first_last_frame_to_video | First+Last Frame → Video | prompt + src_img_url[2] |
reference_image_to_video | Reference Image → Video | prompt + src_img_url[1+] |
text_to_music | Text → Music | prompt |
text_to_speech | Text → Speech | prompt (text to speak) |
Each media in medias[] has two fields:
| Field | Type | Values | Description |
|---|---|---|---|
resource_status | int (or null) | 0, 1, 2, 3 | 0=处理中, 1=可用, 2=失败, 3=已删除。API 可能返回 null,需当作 0。 |
status | string | "pending", "processing", "success", "failed" | 任务状态文案。轮询时以 resource_status 为准;status == "failed" 表示失败。 |
Poll on resource_status first, then ensure status is not "failed":
resource_status | status | Meaning | Action |
|---|---|---|---|
0 or null | pending / processing | 处理中 | Keep polling; do not stop (null = 0) |
1 | success (or completed) | 完成 | Read url; stop only when all medias are 1 |
1 | failed | 失败 (status 优先) | Stop, handle error |
2 | any | 失败 | Stop, handle error |
3 | any | 已删除 | Stop |
Important: (1) Treat
resource_status: nullas 0. (2) Stop only when all medias haveresource_status == 1. (3) Whenresource_status=1, still checkstatus != "failed".
GET /open/v1/product/list?app=ima&platform=web&category=text_to_image
Internally calls downstream /v1/products/listv2. Returns a V2 tree structure: type=2 nodes are model groups, type=3 nodes are versions (leaves). Only type=3 nodes contain credit_rules and form_config.
webAgentis auto-converted toimaby the gateway — you can use either value forapp.
[
{
"id": "SeeDream",
"type": "2",
"name": "SeeDream",
"model_id": "",
"children": [
{
"id": "doubao-seedream-4-0-250828",
"type": "3",
"name": "SeeDream 4.0",
"model_id": "doubao-seedream-4.0",
"credit_rules": [
{ "attribute_id": 332, "points": 5, "attributes": { "default": "enabled" } }
],
"form_config": [
{ "field": "size", "type": "tags", "value": "1K",
"options": [{"label":"1K","value":"1K"}, {"label":"2K","value":"2K"}] }
]
}
]
}
]
How to pick a version for task creation:
type=3 leaves (versions)model_id and id (= model_version) from the leafcredit_rules[].attribute_id matching your desired quality/size (attributes field shows the config)form_config[].value as default parameters values
credit_rules[].attribute_id→ required for task creation asattribute_id.credit_rules[].points→ required for task creation ascreditandcast.points.
POST /open/v1/tasks/create
{
"task_type": "text_to_image",
"enable_multi_model": false,
"src_img_url": [],
"upload_img_src": "",
"parameters": [
{
"attribute_id": 8538,
"model_id": "doubao-seedream-4.5",
"model_name": "SeeDream 4.5",
"model_version": "doubao-seedream-4-5-251128",
"app": "ima",
"platform": "web",
"category": "text_to_image",
"credit": 5,
"parameters": {
"prompt": "a beautiful mountain sunset, photorealistic",
"size": "4k",
"n": 1,
"input_images": [],
"cast": {"points": 5, "attribute_id": 8538}
}
}
]
}
| Field | Required | Description |
|---|---|---|
task_type | ✅ | Must match parameters[].category |
parameters[].attribute_id | ✅ | From credit_rules[].attribute_id in product list |
parameters[].model_id | ✅ | From type=3 leaf node model_id |
parameters[].model_version | ✅ | From type=3 leaf node id |
parameters[].app | ✅ | Use ima (or webAgent, auto-converted) |
parameters[].platform | ✅ | Use web |
parameters[].category | ✅ | Must match top-level task_type |
parameters[].credit | ✅ | Must equal credit_rules[].points. Error 6006 if wrong. |
parameters[].parameters.prompt | ✅ | The actual prompt text used by downstream service |
parameters[].parameters.cast | ✅ | {"points": N, "attribute_id": N} — mirrors credit |
parameters[].parameters.n | ✅ | Number of outputs (usually 1). Gateway flattens N>1 into separate resources. |
parameters[].parameters.input_images | image tasks | Array of input image URLs |
top-level src_img_url | multi-image | Array for first_last_frame / reference tasks |
When n > 1, the gateway automatically:
n independent resourceBizId valuesn times (one per resource)n separate tasks in the downstream serviceResponse medias[] will contain n items. Poll until all have resource_status == 1.
{
"code": 0,
"data": {
"id": "task_abc123",
"biz_id": "biz_xxx",
"task_type": "text_to_image",
"medias": [],
"generate_count": 1,
"created_at": 1700000000000,
"timeout_at": 1700000300000
}
}
data.id = task ID for polling. timeout_at = Unix ms deadline.
POST /open/v1/tasks/detail
{"task_id": "<id from create response>"}
Poll every 2–5s (8s+ for video). Completed response:
{
"id": "task_abc",
"medias": [{
"resource_status": 1,
"status": "success",
"url": "https://cdn.../output.jpg",
"cover": "https://cdn.../cover.jpg",
"format": "jpg",
"width": 1024,
"height": 1024
}]
}
Polling stop condition (must implement exactly):
resource_status: null (or missing) as 0 (processing). Do not stop when you see null; backend may serialize Go *int as null.medias[].resource_status == 1 and no status == "failed". If you return on the first media with resource_status == 1 while others are still 0, the task is not fully done and you will keep polling or get inconsistent state.status == "failed" or resource_status == 2 or resource_status == 3.No image input. src_img_url: [], input_images: []. See API 2 for full example.
Extra fields vs text_to_image — all from form_config defaults:
{
"task_type": "text_to_video",
"src_img_url": [],
"parameters": [{
"attribute_id": 4838,
"model_id": "wan2.6-t2v",
"model_name": "Wan 2.6",
"model_version": "wan2.6-t2v",
"category": "text_to_video",
"credit": 3,
"app": "ima", "platform": "web",
"parameters": {
"prompt": "a puppy dancing happily, sunny meadow",
"negative_prompt": "",
"prompt_extend": false,
"duration": 5,
"resolution": "1080P",
"aspect_ratio": "16:9",
"shot_type": "single",
"seed": -1,
"n": 1,
"input_images": [],
"cast": {"points": 3, "attribute_id": 4838}
}
}]
}
Video-specific fields from
form_config:duration(seconds),resolution,aspect_ratio,shot_type,negative_prompt,prompt_extend. Poll every 8s (video generation is slower). Responsemedias[].cover= first-frame thumbnail.
No image input. src_img_url: [], input_images: [].
{
"task_type": "image_to_image",
"src_img_url": ["https://...input.jpg"],
"parameters": [{
"attribute_id": 8560,
"model_id": "doubao-seedream-4.5",
"model_version": "doubao-seedream-4-5-251128",
"category": "image_to_image",
"credit": 5,
"app": "ima", "platform": "web",
"parameters": {
"prompt": "turn into oil painting style",
"size": "4k",
"n": 1,
"input_images": ["https://...input.jpg"],
"cast": {"points": 5, "attribute_id": 8560}
}
}]
}
⚠️
sizemust be fromform_configoptions (e.g."2k","4k","2048x2048")."adaptive"is NOT valid for SeeDream 4.5 i2i — causes error 400. Top-levelsrc_img_urlandparameters.input_imagesmust both contain the input image URL. Some i2i models (e.g.doubao-seededit-3.0-i2i) may not be available in test environments — fall back to SeeDream 4.5.
{
"src_img_url": ["https://first-frame.jpg", "https://last-frame.jpg"]
}
Index 0 = first frame (or reference), index 1 = last frame (first_last_frame only).
| Mistake | Fix |
|---|---|
attribute_id not from credit_rules | Always fetch product list first |
credit value wrong | Must exactly match credit_rules[].points — error 6006 |
prompt at wrong location | Put prompt in parameters[].parameters.prompt (nested), not only at top level |
Polling biz_id instead of id | Use id (task ID) for /tasks/detail |
| Single-poll instead of loop | Poll until resource_status == 1 for ALL medias |
Missing app / platform in parameters | Required fields — use ima / web |
category mismatch | parameters[].category must match top-level task_type |
resource_status == 2 not handled | Check for failure, don't loop forever |
status == "failed" ignored | resource_status=1 + status="failed" means actual failure |
n > 1 and only checking first media | All n media items must reach resource_status == 1 |
See the Python example sections throughout this documentation for implementation guidance covering all 7 task types.