{"skill":{"slug":"image-video","displayName":"Image Video","summary":"convert images into compiled video file with this skill. Works with JPG, PNG, WEBP, HEIC files up to 200MB. marketers use it for turning a set of photos into...","description":"---\nname: image-video\nversion: \"1.0.0\"\ndisplayName: \"Image Video — Convert Images Into Video\"\ndescription: >\n  convert images into compiled video file with this skill. Works with JPG, PNG, WEBP, HEIC files up to 200MB. marketers use it for turning a set of photos into a cohesive video — processing takes 30-60 seconds on cloud GPUs and you get 1080p MP4 files.\nmetadata: {\"openclaw\": {\"emoji\": \"🖼️\", \"requires\": {\"env\": [\"NEMO_TOKEN\"], \"configPaths\": [\"~/.config/nemovideo/\"]}, \"primaryEnv\": \"NEMO_TOKEN\", \"variant\": \"short_prompts\"}}\n---\n\n## Getting Started\n\n> Share your images and I'll get started on AI video creation. Or just tell me what you're thinking.\n\n**Try saying:**\n- \"convert my images\"\n- \"export 1080p MP4\"\n- \"turn my photos into a slideshow\"\n\n### Quick Start Setup\n\nThis skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know (\"Connecting...\").\n\n**Token check**: Look for `NEMO_TOKEN` in the environment. If found, skip to session creation. Otherwise:\n- Generate a UUID as client identifier\n- POST `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` with `X-Client-Id` header\n- Extract `data.token` from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)\n\n**Session**: POST `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent` with Bearer auth and body `{\"task_name\":\"project\"}`. Keep the returned `session_id` for all operations.\n\nLet the user know with a brief \"Ready!\" when setup is complete. Don't expose tokens or raw API output.\n\n# Image Video — Convert Images Into Video\n\nDrop your images in the chat and tell me what you need. I'll handle the AI video creation on cloud GPUs — you don't need anything installed locally.\n\nHere's a typical use: you send a five product photos in JPG format, ask for turn my photos into a slideshow video with transitions and background music, and about 30-60 seconds later you've got a MP4 file ready to download. The whole thing runs at 1080p by default.\n\nOne thing worth knowing — using consistently sized images avoids awkward cropping in the final video.\n\n## Matching Input to Actions\n\nUser prompts referencing image video, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.\n\n| User says... | Action | Skip SSE? |\n|-------------|--------|----------|\n| \"export\" / \"导出\" / \"download\" / \"send me the video\" | → §3.5 Export | ✅ |\n| \"credits\" / \"积分\" / \"balance\" / \"余额\" | → §3.3 Credits | ✅ |\n| \"status\" / \"状态\" / \"show tracks\" | → §3.4 State | ✅ |\n| \"upload\" / \"上传\" / user sends file | → §3.2 Upload | ✅ |\n| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |\n\n## Cloud Render Pipeline Details\n\nEach export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.\n\nSkill attribution — read from this file's YAML frontmatter at runtime:\n- `X-Skill-Source`: `image-video`\n- `X-Skill-Version`: from frontmatter `version`\n- `X-Skill-Platform`: detect from install path (`~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`, else `unknown`)\n\n**All requests** must include: `Authorization: Bearer <NEMO_TOKEN>`, `X-Skill-Source`, `X-Skill-Version`, `X-Skill-Platform`. Missing attribution headers will cause export to fail with 402.\n\n**API base**: `https://mega-api-prod.nemovideo.ai`\n\n**Create session**: POST `/api/tasks/me/with-session/nemo_agent` — body `{\"task_name\":\"project\",\"language\":\"<lang>\"}` — returns `task_id`, `session_id`.\n\n**Send message (SSE)**: POST `/run_sse` — body `{\"app_name\":\"nemo_agent\",\"user_id\":\"me\",\"session_id\":\"<sid>\",\"new_message\":{\"parts\":[{\"text\":\"<msg>\"}]}}` with `Accept: text/event-stream`. Max timeout: 15 minutes.\n\n**Upload**: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F \"files=@/path\"`, or URL: `{\"urls\":[\"<url>\"],\"source_type\":\"url\"}`\n\n**Credits**: GET `/api/credits/balance/simple` — returns `available`, `frozen`, `total`\n\n**Session state**: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`\n\n**Export** (free, no credits): POST `/api/render/proxy/lambda` — body `{\"id\":\"render_<ts>\",\"sessionId\":\"<sid>\",\"draft\":<json>,\"output\":{\"format\":\"mp4\",\"quality\":\"high\"}}`. Poll GET `/api/render/proxy/lambda/<id>` every 30s until `status` = `completed`. Download URL at `output.url`.\n\nSupported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.\n\n### Reading the SSE Stream\n\nText events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty `data:` lines mean the backend is still working — show \"⏳ Still working...\" every 2 minutes.\n\nAbout 30% of edit operations close the stream without any text. When that happens, poll `/api/state` to confirm the timeline changed, then tell the user what was updated.\n\n### Backend Response Translation\n\nThe backend assumes a GUI exists. Translate these into API actions:\n\n| Backend says | You do |\n|-------------|--------|\n| \"click [button]\" / \"点击\" | Execute via API |\n| \"open [panel]\" / \"打开\" | Query session state |\n| \"drag/drop\" / \"拖拽\" | Send edit via SSE |\n| \"preview in timeline\" | Show track summary |\n| \"Export button\" / \"导出\" | Execute export workflow |\n\n**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.\n\n```\nTimeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: \"Urban Dreams\" (0-3s)\n```\n\n### Error Handling\n\n| Code | Meaning | Action |\n|------|---------|--------|\n| 0 | Success | Continue |\n| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |\n| 1002 | Session not found | New session §3.0 |\n| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: \"Top up credits in your account\" |\n| 4001 | Unsupported file | Show supported formats |\n| 4002 | File too large | Suggest compress/trim |\n| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |\n| 402 | Free plan export blocked | Subscription tier issue, NOT credits. \"Register or upgrade your plan to unlock export.\" |\n| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |\n\n## Common Workflows\n\n**Quick edit**: Upload → \"turn my photos into a slideshow video with transitions and background music\" → Download MP4. Takes 30-60 seconds for a 30-second clip.\n\n**Batch style**: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.\n\n**Iterative**: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.\n\n## Tips and Tricks\n\nThe backend processes faster when you're specific. Instead of \"make it look better\", try \"turn my photos into a slideshow video with transitions and background music\" — concrete instructions get better results.\n\nMax file size is 200MB. Stick to JPG, PNG, WEBP, HEIC for the smoothest experience.\n\nExport as MP4 for widest compatibility.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":364,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1775955406210,"updatedAt":1778492536962},"latestVersion":{"version":"1.0.0","createdAt":1775955406210,"changelog":"Image Video skill initial release — convert your images into 1080p MP4 videos using cloud processing.\n\n- Accepts JPG, PNG, WEBP, HEIC images up to 200MB for video creation.\n- Cloud-based conversion with automated setup, including easy token retrieval (100 free credits, 7-day expiry).\n- Handles uploads, video editing requests, credits queries, and export actions via simple prompts.\n- Exports high-quality 1080p MP4 files within 30–90 seconds.\n- Full support for export, credits, status, and session state.\n- Includes robust error handling for authentication, credits, and file issues.","license":"MIT-0"},"metadata":{"setup":[{"key":"NEMO_TOKEN","required":true}],"os":null,"systems":null},"owner":{"handle":"linmillsd7","userId":"s17ad7jr0sehkkd5tzc6dmp6gh84csz4","displayName":"linmillsd7","image":"https://avatars.githubusercontent.com/u/208832666?v=4"},"moderation":null}