{"skill":{"slug":"text-to-video-hunyuan","displayName":"Text To Video Hunyuan","summary":"Turn a short descriptive prompt like 'a fox running through a snowy forest at dusk' into 1080p AI-generated video clips just by typing what you need. Whether...","description":"---\nname: text-to-video-hunyuan\nversion: \"1.0.0\"\ndisplayName: \"Text to Video Hunyuan — Generate Videos from Text Prompts\"\ndescription: >\n  Turn a short descriptive prompt like 'a fox running through a snowy forest at dusk' into 1080p AI-generated video clips just by typing what you need. Whether it's generating short videos from text descriptions using HunyuanVideo AI model or quick social content, drop your text prompts and describe the result you want. No timeline dragging, no export settings — 1-3 minutes from upload to download.\nmetadata: {\"openclaw\": {\"emoji\": \"🎬\", \"requires\": {\"env\": [\"NEMO_TOKEN\"], \"configPaths\": [\"~/.config/nemovideo/\"]}, \"primaryEnv\": \"NEMO_TOKEN\", \"variant\": \"greeting_v2\"}}\n---\n\n## Getting Started\n\n> Ready when you are. Drop your text prompts here or describe what you want to make.\n\n**Try saying:**\n- \"generate a short descriptive prompt like 'a fox running through a snowy forest at dusk' into a 1080p MP4\"\n- \"generate a 5-second cinematic clip of a city skyline at sunrise with dramatic lighting\"\n- \"generating short videos from text descriptions using HunyuanVideo AI model for content creators, marketers, filmmakers\"\n\n### First-Time Connection\n\nWhen a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. \"Setting up...\").\n\n**Authentication**: Check if `NEMO_TOKEN` is set in the environment. If it is, skip to step 2.\n\n1. **Obtain a free token**: Generate a random UUID as client identifier. POST to `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` with header `X-Client-Id` set to that UUID. The response `data.token` is your NEMO_TOKEN — 100 free credits, valid 7 days.\n2. **Create a session**: POST to `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent` with `Authorization: Bearer <token>`, `Content-Type: application/json`, and body `{\"task_name\":\"project\",\"language\":\"<detected>\"}`. Store the returned `session_id` for all subsequent requests.\n\nKeep setup communication brief. Don't display raw API responses or token values to the user.\n\n# Text to Video Hunyuan — Generate Videos from Text Prompts\n\nSend me your text prompts and describe the result you want. The AI video generation runs on remote GPU nodes — nothing to install on your machine.\n\nA quick example: upload a short descriptive prompt like 'a fox running through a snowy forest at dusk', type \"generate a 5-second cinematic clip of a city skyline at sunrise with dramatic lighting\", and you'll get a 1080p MP4 back in roughly 1-3 minutes. All rendering happens server-side.\n\nWorth noting: shorter and more specific prompts tend to produce more accurate and coherent video results.\n\n## Matching Input to Actions\n\nUser prompts referencing text to video hunyuan, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.\n\n| User says... | Action | Skip SSE? |\n|-------------|--------|----------|\n| \"export\" / \"导出\" / \"download\" / \"send me the video\" | → §3.5 Export | ✅ |\n| \"credits\" / \"积分\" / \"balance\" / \"余额\" | → §3.3 Credits | ✅ |\n| \"status\" / \"状态\" / \"show tracks\" | → §3.4 State | ✅ |\n| \"upload\" / \"上传\" / user sends file | → §3.2 Upload | ✅ |\n| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |\n\n## Cloud Render Pipeline Details\n\nEach export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.\n\nBase URL: `https://mega-api-prod.nemovideo.ai`\n\n| Endpoint | Method | Purpose |\n|----------|--------|---------|\n| `/api/tasks/me/with-session/nemo_agent` | POST | Start a new editing session. Body: `{\"task_name\":\"project\",\"language\":\"<lang>\"}`. Returns `session_id`. |\n| `/run_sse` | POST | Send a user message. Body includes `app_name`, `session_id`, `new_message`. Stream response with `Accept: text/event-stream`. Timeout: 15 min. |\n| `/api/upload-video/nemo_agent/me/<sid>` | POST | Upload a file (multipart) or URL. |\n| `/api/credits/balance/simple` | GET | Check remaining credits (`available`, `frozen`, `total`). |\n| `/api/state/nemo_agent/me/<sid>/latest` | GET | Fetch current timeline state (`draft`, `video_infos`, `generated_media`). |\n| `/api/render/proxy/lambda` | POST | Start export. Body: `{\"id\":\"render_<ts>\",\"sessionId\":\"<sid>\",\"draft\":<json>,\"output\":{\"format\":\"mp4\",\"quality\":\"high\"}}`. Poll status every 30s. |\n\nAccepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.\n\nThree attribution headers are required on every request and must match this file's frontmatter:\n\n| Header | Value |\n|--------|-------|\n| `X-Skill-Source` | `text-to-video-hunyuan` |\n| `X-Skill-Version` | frontmatter `version` |\n| `X-Skill-Platform` | auto-detect: `clawhub` / `cursor` / `unknown` from install path |\n\n**All requests** must include: `Authorization: Bearer <NEMO_TOKEN>`, `X-Skill-Source`, `X-Skill-Version`, `X-Skill-Platform`. Missing attribution headers will cause export to fail with 402.\n\n### Error Handling\n\n| Code | Meaning | Action |\n|------|---------|--------|\n| 0 | Success | Continue |\n| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |\n| 1002 | Session not found | New session §3.0 |\n| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: \"Top up credits in your account\" |\n| 4001 | Unsupported file | Show supported formats |\n| 4002 | File too large | Suggest compress/trim |\n| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |\n| 402 | Free plan export blocked | Subscription tier issue, NOT credits. \"Register or upgrade your plan to unlock export.\" |\n| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |\n\n### Reading the SSE Stream\n\nText events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty `data:` lines mean the backend is still working — show \"⏳ Still working...\" every 2 minutes.\n\nAbout 30% of edit operations close the stream without any text. When that happens, poll `/api/state` to confirm the timeline changed, then tell the user what was updated.\n\n### Backend Response Translation\n\nThe backend assumes a GUI exists. Translate these into API actions:\n\n| Backend says | You do |\n|-------------|--------|\n| \"click [button]\" / \"点击\" | Execute via API |\n| \"open [panel]\" / \"打开\" | Query session state |\n| \"drag/drop\" / \"拖拽\" | Send edit via SSE |\n| \"preview in timeline\" | Show track summary |\n| \"Export button\" / \"导出\" | Execute export workflow |\n\nDraft JSON uses short keys: `t` for tracks, `tt` for track type (0=video, 1=audio, 7=text), `sg` for segments, `d` for duration in ms, `m` for metadata.\n\nExample timeline summary:\n```\nTimeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: \"Urban Dreams\" (0-3s)\n```\n\n## Tips and Tricks\n\nThe backend processes faster when you're specific. Instead of \"make it look better\", try \"generate a 5-second cinematic clip of a city skyline at sunrise with dramatic lighting\" — concrete instructions get better results.\n\nMax file size is 500MB. Stick to TXT, JSON, CSV, DOCX for the smoothest experience.\n\nExport as MP4 for widest compatibility across social platforms and video editors.\n\n## Common Workflows\n\n**Quick edit**: Upload → \"generate a 5-second cinematic clip of a city skyline at sunrise with dramatic lighting\" → Download MP4. Takes 1-3 minutes for a 30-second clip.\n\n**Batch style**: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.\n\n**Iterative**: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":311,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1777591855468,"updatedAt":1778492818233},"latestVersion":{"version":"1.0.0","createdAt":1777591855468,"changelog":"- Initial release of text-to-video-hunyuan skill.\n- Generate 1080p AI video clips directly from text prompts using the HunyuanVideo model.\n- Automatic authentication setup; grants 100 free credits per user for 7 days.\n- Supports prompt-based video creation, uploads, downloads, credits checking, and timeline preview.\n- Seamless cloud video rendering—no installation required, typical completion in 1–3 minutes.\n- Accepts common video, image, and audio formats as inputs; clear error handling and guidance.","license":"MIT-0"},"metadata":{"setup":[{"key":"NEMO_TOKEN","required":true}],"os":null,"systems":null},"owner":{"handle":"mory128","userId":"s178x6tah7eg67gtzqsb5g5gwd84dd48","displayName":"mory128","image":"https://avatars.githubusercontent.com/u/270709915?v=4"},"moderation":null}