Text To Video Free Api

v1.0.0

Get ready-to-share videos ready to post, without touching a single slider. Upload your text prompts (TXT, DOCX, JSON, CSV, up to 500MB), say something like "...

⭐ 0· 75·0 current·0 all-time

by@whitejohnk-26

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for whitejohnk-26/text-to-video-free-api.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Text To Video Free Api" (whitejohnk-26/text-to-video-free-api) from ClawHub.
Skill page: https://clawhub.ai/whitejohnk-26/text-to-video-free-api
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install text-to-video-free-api

ClawHub CLI

Package manager switcher

npx clawhub@latest install text-to-video-free-api

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description match the runtime instructions: the SKILL.md documents an API for generating videos and the single required env var (NEMO_TOKEN) is consistent with that. However, the SKILL.md frontmatter declares a config path (~/.config/nemovideo/) that is not listed in the registry metadata, an inconsistency worth questioning.

Instruction Scope

Instructions direct the agent to obtain or use a bearer token, create sessions, upload user files (multipart @/path), stream SSE messages, poll render status, and download results from https://mega-api-prod.nemovideo.ai. These actions are within the stated purpose, but the doc explicitly references reading local file paths for uploads and a local config directory — ensure the agent will only access files explicitly provided by the user and not arbitrary local files or other config directories.

✓

Install Mechanism

No install spec or code files — instruction-only. This minimizes installation risk (nothing new is written to disk by an installer), but also means the skill's behavior is governed entirely by the prose in SKILL.md.

ℹ

Credentials

Only NEMO_TOKEN is declared as required which is proportional for a third-party API. Still, the frontmatter also references reading ~/.config/nemovideo/, which could contain other credentials; that discrepancy between registry metadata and SKILL.md should be clarified before granting access to local config paths.

✓

Persistence & Privilege

always:false and normal autonomous invocation settings. The skill does not request persistent system-wide privileges or modifications to other skills. Autonomous invocation is allowed by default and is not, by itself, a flagged issue.

Scan Findings in Context

[no-static-findings] expected: The static scanner found no code because this is an instruction-only skill (SKILL.md only). Absence of findings is not evidence of safety; review SKILL.md behavior instead.

What to consider before installing

This skill appears to implement a text→video pipeline and only needs a NEMO_TOKEN, which is consistent — but proceed cautiously because the skill has no known homepage or publisher and the package metadata and SKILL.md disagree about a config path. Before installing or using it: (1) don't set a long-lived global NEMO_TOKEN unless you trust the service — prefer ephemeral/limited tokens; (2) avoid uploading sensitive local files or secrets (it accepts file paths and URLs up to 500MB); (3) ask the publisher for a homepage, privacy policy, and explanation of the ~/.config/nemovideo/ usage; (4) prefer using the anonymous-token flow (ephemeral, limited credits) when testing; (5) if you must provide a token, limit its scope and monitor activity/credits. If you want a higher-confidence assessment, provide the skill's publisher info or the network endpoints' owner (WHOIS / TLS cert) so origin/legitimacy can be verified.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎬 Clawdis

EnvNEMO_TOKEN

Primary envNEMO_TOKEN

latestvk97fvgzmgzn6n39yn7nbzf9nsd85bxnb

75downloads

0stars

1versions

Updated 6d ago

v1.0.0

MIT-0

Getting Started

Got text prompts to work with? Send it over and tell me what you need — I'll take care of the AI video creation.

Try saying:

"generate a 50-word product description into a 1080p MP4"
"convert this text description into a 30-second video with visuals and music"
"generating videos from text descriptions via API for developers and marketers"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Text to Video Free API — Generate Videos from Text

This tool takes your text prompts and runs AI video creation through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 50-word product description and want to convert this text description into a 30-second video with visuals and music — the backend processes it in about 1-2 minutes and hands you a 1080p MP4.

Tip: shorter text prompts under 100 words produce more focused and accurate video results.

Matching Input to Actions

User prompts referencing text to video free api, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Every API call needs Authorization: Bearer <NEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.

Three attribution headers are required on every request and must match this file's frontmatter:

Header	Value
`X-Skill-Source`	`text-to-video-free-api`
`X-Skill-Version`	frontmatter `version`
`X-Skill-Platform`	auto-detect: `clawhub` / `cursor` / `unknown` from install path

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Error Codes

0 — success, continue normally
1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
1002 — session not found; create a new one
2001 — out of credits; anonymous users get a registration link with ?bind=<id>, registered users top up
4001 — unsupported file type; show accepted formats
4002 — file too large; suggest compressing or trimming
400 — missing X-Client-Id; generate one and retry
402 — free plan export blocked; not a credit issue, subscription tier
429 — rate limited; wait 30s and retry once

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

"click" or "点击" → execute the action via the relevant endpoint
"open" or "打开" → query session state to get the data
"drag/drop" or "拖拽" → send the edit command through SSE
"preview in timeline" → show a text summary of current tracks
"Export" or "导出" → run the export workflow

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "convert this text description into a 30-second video with visuals and music" — concrete instructions get better results.

Max file size is 500MB. Stick to TXT, DOCX, JSON, CSV for the smoothest experience.

Export as MP4 for widest compatibility across platforms and devices.

Common Workflows

Quick edit: Upload → "convert this text description into a 30-second video with visuals and music" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Comments

Loading comments...