Youtube Video Subtitle Generator

v1.0.0

generate YouTube video files into captioned video files with this skill. Works with MP4, MOV, AVI, WebM files up to 500MB. YouTubers use it for adding subtit...

0· 44·0 current·0 all-time
bypeandrover adam@peand-rover

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for peand-rover/youtube-video-subtitle-generator.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Youtube Video Subtitle Generator" (peand-rover/youtube-video-subtitle-generator) from ClawHub.
Skill page: https://clawhub.ai/peand-rover/youtube-video-subtitle-generator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install youtube-video-subtitle-generator

ClawHub CLI

Package manager switcher

npx clawhub@latest install youtube-video-subtitle-generator
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the runtime instructions: the SKILL.md describes uploading video files, creating a session, running renders, polling for results, and returning download URLs. The single required env var (NEMO_TOKEN) is the service token needed to call the stated API endpoints — this is proportionate to the described capability.
Instruction Scope
Instructions are limited to connecting to the stated backend, uploading user-supplied files, running render jobs, polling state, and returning results. They do not instruct reading arbitrary system files or unrelated credentials. One ambiguity: frontmatter/metadata references a config path (~/.config/nemovideo/) and asks to auto-detect 'install path' for X-Skill-Platform header — these imply filesystem queries that are not otherwise declared, so confirm whether the skill will access that path.
Install Mechanism
This is an instruction-only skill with no install spec or code files, so nothing is downloaded or written to disk by an installer. That lowers install-time risk.
Credentials
Only one credential is required (NEMO_TOKEN), which the SKILL.md uses directly. The instructions also document obtaining an anonymous token from the service if NEMO_TOKEN is absent — that behavior is consistent with needing service access. No unrelated secrets or multi-service credentials are requested.
Persistence & Privilege
always is false and the skill does not request elevated platform privileges. The SKILL.md describes creating a session on the backend but does not instruct modifying other skills or system-wide settings. The only potential persistence hint is an optional config path in the frontmatter (see instruction_scope note).
Assessment
This skill appears to do what it says: it needs a NEMO_TOKEN to call the nemovideo.ai APIs to upload, render, and download captioned videos. Before installing or providing a token: 1) Confirm you trust the service domain (https://mega-api-prod.nemovideo.ai) and that NEMO_TOKEN is scoped only to this video-rendering service — anyone holding that token can act as you against the API. 2) Ask the publisher whether the skill will read or write the local config path (~/.config/nemovideo/) or probe the install path (the SKILL.md mentions this but the registry metadata is inconsistent). 3) If you don’t want to supply a permanent token, the skill can fetch a time-limited anonymous token per the instructions, but that gives the skill temporary service access. 4) Because this is instruction-only with no install, risk is mainly what you choose to upload and what token you provide — avoid giving high-privilege or unrelated credentials. If you need higher assurance, request the actual code or an official homepage for independent review.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎬 Clawdis
EnvNEMO_TOKEN
Primary envNEMO_TOKEN
latestvk97arnw7z68mgyrp26xy69xbcd85jrq2
44downloads
0stars
1versions
Updated 1d ago
v1.0.0
MIT-0

Getting Started

Ready when you are. Drop your YouTube video files here or describe what you want to make.

Try saying:

  • "generate a 10-minute YouTube tutorial video into a 1080p MP4"
  • "generate subtitles in English and add them as burned-in captions"
  • "adding subtitles to YouTube videos for YouTubers"

Getting Connected

Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

  • Generate a UUID as client identifier
  • POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
  • The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

YouTube Video Subtitle Generator — Generate and Embed Video Subtitles

This tool takes your YouTube video files and runs AI subtitle generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 10-minute YouTube tutorial video and want to generate subtitles in English and add them as burned-in captions — the backend processes it in about 1-2 minutes and hands you a 1080p MP4.

Tip: shorter video segments produce more accurate subtitle sync.

Matching Input to Actions

User prompts referencing youtube video subtitle generator, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Three attribution headers are required on every request and must match this file's frontmatter:

HeaderValue
X-Skill-Sourceyoutube-video-subtitle-generator
X-Skill-Versionfrontmatter version
X-Skill-Platformauto-detect: clawhub / cursor / unknown from install path

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth via anonymous-token (tokens expire after 7 days)
1002Session not foundNew session §3.0
2001No creditsAnonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account"
4001Unsupported fileShow supported formats
4002File too largeSuggest compress/trim
400Missing X-Client-IdGenerate Client-Id and retry (see §1)
402Free plan export blockedSubscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429Rate limit (1 token/client/7 days)Retry in 30s once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate subtitles in English and add them as burned-in captions" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across YouTube and social platforms.

Common Workflows

Quick edit: Upload → "generate subtitles in English and add them as burned-in captions" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Comments

Loading comments...