Text To Ai Video

v1.0.0

Skip the learning curve of professional editing software. Describe what you want — turn this text into a 30-second explainer video with visuals and voiceover...

⭐ 0· 58·0 current·0 all-time

by@mory128

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mory128/text-to-ai-video.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Text To Ai Video" (mory128/text-to-ai-video) from ClawHub.
Skill page: https://clawhub.ai/mory128/text-to-ai-video
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install text-to-ai-video

ClawHub CLI

Package manager switcher

npx clawhub@latest install text-to-ai-video

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill's name/description (text→video via a cloud backend) matches the API endpoints and actions described. However the registry metadata marks NEMO_TOKEN as a required env var and declares a config path (~/.config/nemovideo/) even though the runtime instructions implement an anonymous-token acquisition flow when NEMO_TOKEN is absent and do not read the declared config path. This is an inconsistency (why declare a required env var/config path if the skill can obtain an anon token and doesn't read that path?).

✓

Instruction Scope

SKILL.md confines actions to the stated purpose: creating sessions, sending SSE messages, uploading files or URLs, polling render status, and asking for credits/state. It explicitly instructs network requests to the nemo API and to include specific headers. It does not instruct reading unrelated system files or other credentials. The only scope oddity is the requirement to auto-detect an install path for X-Skill-Platform and the metadata config path, which are mentioned but not clearly required for core functionality.

✓

Install Mechanism

This is an instruction-only skill with no install spec or code files — lowest-risk install mechanism. Nothing is written to disk by the skill itself per the provided instructions.

ℹ

Credentials

Only one credential is declared (NEMO_TOKEN), which is appropriate for a cloud video API. But the skill's runtime allows creating an anonymous token if NEMO_TOKEN is missing, which conflicts with the 'required env var' declaration. The metadata also lists a config path that the instructions don't use; this could be a benign metadata artifact but is disproportionate unless the skill actually reads that path.

✓

Persistence & Privilege

Skill is not always-enabled and is user-invocable; it does not request system-wide changes or alter other skills. It creates and holds a session token for the cloud service (expected behavior) but does not request elevated persistence privileges.

What to consider before installing

This skill will send whatever text and files you provide to an external service (https://mega-api-prod.nemovideo.ai) for processing and will include an API token in requests. The registry declares NEMO_TOKEN as required but the skill will instead obtain an anonymous token automatically if you don't provide one — decide whether you want to supply your own token or rely on the anonymous flow. Before installing, consider: 1) Do you trust the external service and its privacy/data retention policy for any text or files (including proprietary content) you upload? 2) If using your own NEMO_TOKEN, ensure that token's scope is limited and revocable. 3) Be aware of credit/usage limits and any billing or rate-limit behaviors described in the skill. 4) The metadata lists a config path (~/.config/nemovideo/) — verify whether the skill will actually read local config files if that matters to you. If you need higher assurance, request more details from the skill author (why the env/config mismatch, data retention policy, and whether uploads are stored long-term).

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎬 Clawdis

EnvNEMO_TOKEN

Primary envNEMO_TOKEN

latestvk977k7q02acg4pmf3hvwr4y7e9853gbp

58downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

Getting Started

Got text prompts to work with? Send it over and tell me what you need — I'll take care of the AI video creation.

Try saying:

"generate a 150-word product description into a 1080p MP4"
"turn this text into a 30-second explainer video with visuals and voiceover"
"generating videos from written scripts or descriptions for marketers, content creators, educators"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

Generate a UUID as client identifier
POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Text to AI Video — Generate Videos from Text

Drop your text prompts in the chat and tell me what you need. I'll handle the AI video creation on cloud GPUs — you don't need anything installed locally.

Here's a typical use: you send a a 150-word product description, ask for turn this text into a 30-second explainer video with visuals and voiceover, and about 1-2 minutes later you've got a MP4 file ready to download. The whole thing runs at 1080p by default.

One thing worth knowing — shorter, clearer text prompts produce more accurate and focused video results.

Matching Input to Actions

User prompts referencing text to ai video, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

Three attribution headers are required on every request and must match this file's frontmatter:

Header	Value
`X-Skill-Source`	`text-to-ai-video`
`X-Skill-Version`	frontmatter `version`
`X-Skill-Platform`	auto-detect: `clawhub` / `cursor` / `unknown` from install path

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Error Codes

0 — success, continue normally
1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
1002 — session not found; create a new one
2001 — out of credits; anonymous users get a registration link with ?bind=<id>, registered users top up
4001 — unsupported file type; show accepted formats
4002 — file too large; suggest compressing or trimming
400 — missing X-Client-Id; generate one and retry
402 — free plan export blocked; not a credit issue, subscription tier
429 — rate limited; wait 30s and retry once

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Common Workflows

Quick edit: Upload → "turn this text into a 30-second explainer video with visuals and voiceover" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "turn this text into a 30-second explainer video with visuals and voiceover" — concrete instructions get better results.

Max file size is 500MB. Stick to TXT, DOCX, PDF, copied text for the smoothest experience.

Export as MP4 for widest compatibility across platforms and devices.

Comments

Loading comments...