{"skill":{"slug":"text-and-video","displayName":"Text And Video","summary":"Skip the learning curve of professional editing software. Describe what you want — turn this text into a video with visuals and voiceover — and get text-driv...","description":"---\nname: text-and-video\nversion: \"1.0.0\"\ndisplayName: \"Text and Video — Convert Text Into Finished Videos\"\ndescription: >\n  Skip the learning curve of professional editing software. Describe what you want — turn this text into a video with visuals and voiceover — and get text-driven videos back in 1-2 minutes. Upload MP4, MOV, TXT, DOCX files up to 500MB, and the AI handles text-to-video creation automatically. Ideal for marketers, content creators, educators who want to turn written content into video without filming anything.\nmetadata: {\"openclaw\": {\"emoji\": \"🎬\", \"requires\": {\"env\": [\"NEMO_TOKEN\"], \"configPaths\": [\"~/.config/nemovideo/\"]}, \"primaryEnv\": \"NEMO_TOKEN\", \"variant\": \"greeting_v2\"}}\n---\n\n## Getting Started\n\n> Ready when you are. Drop your text and video here or describe what you want to make.\n\n**Try saying:**\n- \"convert a 200-word product description script into a 1080p MP4\"\n- \"turn this text into a video with visuals and voiceover\"\n- \"converting written scripts or text into finished videos for marketers, content creators, educators\"\n\n### Quick Start Setup\n\nThis skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know (\"Connecting...\").\n\n**Token check**: Look for `NEMO_TOKEN` in the environment. If found, skip to session creation. Otherwise:\n- Generate a UUID as client identifier\n- POST `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` with `X-Client-Id` header\n- Extract `data.token` from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)\n\n**Session**: POST `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent` with Bearer auth and body `{\"task_name\":\"project\"}`. Keep the returned `session_id` for all operations.\n\nLet the user know with a brief \"Ready!\" when setup is complete. Don't expose tokens or raw API output.\n\n# Text and Video — Convert Text Into Finished Videos\n\nSend me your text and video and describe the result you want. The text-to-video creation runs on remote GPU nodes — nothing to install on your machine.\n\nA quick example: upload a 200-word product description script, type \"turn this text into a video with visuals and voiceover\", and you'll get a 1080p MP4 back in roughly 1-2 minutes. All rendering happens server-side.\n\nWorth noting: shorter text blocks under 150 words produce tighter, more focused videos.\n\n## Matching Input to Actions\n\nUser prompts referencing text and video, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.\n\n| User says... | Action | Skip SSE? |\n|-------------|--------|----------|\n| \"export\" / \"导出\" / \"download\" / \"send me the video\" | → §3.5 Export | ✅ |\n| \"credits\" / \"积分\" / \"balance\" / \"余额\" | → §3.3 Credits | ✅ |\n| \"status\" / \"状态\" / \"show tracks\" | → §3.4 State | ✅ |\n| \"upload\" / \"上传\" / user sends file | → §3.2 Upload | ✅ |\n| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |\n\n## Cloud Render Pipeline Details\n\nEach export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.\n\nEvery API call needs `Authorization: Bearer <NEMO_TOKEN>` plus the three attribution headers above. If any header is missing, exports return 402.\n\nHeaders are derived from this file's YAML frontmatter. `X-Skill-Source` is `text-and-video`, `X-Skill-Version` comes from the `version` field, and `X-Skill-Platform` is detected from the install path (`~/.clawhub/` = `clawhub`, `~/.cursor/skills/` = `cursor`, otherwise `unknown`).\n\n**API base**: `https://mega-api-prod.nemovideo.ai`\n\n**Create session**: POST `/api/tasks/me/with-session/nemo_agent` — body `{\"task_name\":\"project\",\"language\":\"<lang>\"}` — returns `task_id`, `session_id`.\n\n**Send message (SSE)**: POST `/run_sse` — body `{\"app_name\":\"nemo_agent\",\"user_id\":\"me\",\"session_id\":\"<sid>\",\"new_message\":{\"parts\":[{\"text\":\"<msg>\"}]}}` with `Accept: text/event-stream`. Max timeout: 15 minutes.\n\n**Upload**: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F \"files=@/path\"`, or URL: `{\"urls\":[\"<url>\"],\"source_type\":\"url\"}`\n\n**Credits**: GET `/api/credits/balance/simple` — returns `available`, `frozen`, `total`\n\n**Session state**: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`\n\n**Export** (free, no credits): POST `/api/render/proxy/lambda` — body `{\"id\":\"render_<ts>\",\"sessionId\":\"<sid>\",\"draft\":<json>,\"output\":{\"format\":\"mp4\",\"quality\":\"high\"}}`. Poll GET `/api/render/proxy/lambda/<id>` every 30s until `status` = `completed`. Download URL at `output.url`.\n\nSupported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.\n\n### Error Handling\n\n| Code | Meaning | Action |\n|------|---------|--------|\n| 0 | Success | Continue |\n| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |\n| 1002 | Session not found | New session §3.0 |\n| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: \"Top up credits in your account\" |\n| 4001 | Unsupported file | Show supported formats |\n| 4002 | File too large | Suggest compress/trim |\n| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |\n| 402 | Free plan export blocked | Subscription tier issue, NOT credits. \"Register or upgrade your plan to unlock export.\" |\n| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |\n\n### Translating GUI Instructions\n\nThe backend responds as if there's a visual interface. Map its instructions to API calls:\n\n- \"click\" or \"点击\" → execute the action via the relevant endpoint\n- \"open\" or \"打开\" → query session state to get the data\n- \"drag/drop\" or \"拖拽\" → send the edit command through SSE\n- \"preview in timeline\" → show a text summary of current tracks\n- \"Export\" or \"导出\" → run the export workflow\n\n### Reading the SSE Stream\n\nText events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty `data:` lines mean the backend is still working — show \"⏳ Still working...\" every 2 minutes.\n\nAbout 30% of edit operations close the stream without any text. When that happens, poll `/api/state` to confirm the timeline changed, then tell the user what was updated.\n\nDraft JSON uses short keys: `t` for tracks, `tt` for track type (0=video, 1=audio, 7=text), `sg` for segments, `d` for duration in ms, `m` for metadata.\n\nExample timeline summary:\n```\nTimeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: \"Urban Dreams\" (0-3s)\n```\n\n## Tips and Tricks\n\nThe backend processes faster when you're specific. Instead of \"make it look better\", try \"turn this text into a video with visuals and voiceover\" — concrete instructions get better results.\n\nMax file size is 500MB. Stick to MP4, MOV, TXT, DOCX for the smoothest experience.\n\nExport as MP4 for widest compatibility across platforms and devices.\n\n## Common Workflows\n\n**Quick edit**: Upload → \"turn this text into a video with visuals and voiceover\" → Download MP4. Takes 1-2 minutes for a 30-second clip.\n\n**Batch style**: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.\n\n**Iterative**: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":322,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1777550322860,"updatedAt":1778492809868},"latestVersion":{"version":"1.0.0","createdAt":1777550322860,"changelog":"- Initial release: Turn text into finished videos with visuals and voiceover in 1–2 minutes.\n- Supports uploads up to 500MB (MP4, MOV, TXT, DOCX) for automatic text-to-video creation.\n- Simple cloud-based setup with automatic token management and session handling.\n- Intuitive prompt-based actions: upload, export, check credits/status, and more.\n- Server-side video rendering with instant previews and downloadable exports in common formats.","license":"MIT-0"},"metadata":{"setup":[{"key":"NEMO_TOKEN","required":true}],"os":null,"systems":null},"owner":{"handle":"peand-rover","userId":"s1798dk1219d0v445rb778f65d83gn1w","displayName":"peandrover adam","image":"https://avatars.githubusercontent.com/u/257755799?v=4"},"moderation":null}