Youtube Video Caption Download

v1.0.0

Skip the learning curve of professional editing software. Describe what you want — download the captions from this YouTube video as a text file — and get dow...

⭐ 0· 40·0 current·0 all-time

by@tk8544-b

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for tk8544-b/youtube-video-caption-download.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Youtube Video Caption Download" (tk8544-b/youtube-video-caption-download) from ClawHub.
Skill page: https://clawhub.ai/tk8544-b/youtube-video-caption-download
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install youtube-video-caption-download

ClawHub CLI

Package manager switcher

npx clawhub@latest install youtube-video-caption-download

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The skill claims to download captions and describes a cloud render/upload pipeline; the single required env var (NEMO_TOKEN) and the documented API endpoints (nemovideo.ai) are consistent with that purpose. The skill also supports uploading video files and working with YouTube URLs, which matches the description.

ℹ

Instruction Scope

The SKILL.md instructs the agent to use NEMO_TOKEN (or obtain an anonymous token), create sessions, upload files or URLs, poll render status, and return download URLs. These actions are expected for this service. Note: the instructions tell the agent to upload local files (multipart file upload) and to hide technical details from the user; uploading local files is normal for this use case but is a data-exfiltration vector if users mistakenly upload sensitive files.

✓

Install Mechanism

There is no install spec and no code files (instruction-only), so nothing is written to disk and no external packages are pulled in by the skill itself. This is the lowest-risk install model.

ℹ

Credentials

The only required environment credential is NEMO_TOKEN (declared as primary), which is proportionate for an API-backed service. The SKILL.md also documents an anonymous-token fallback it will obtain from the public API if NEMO_TOKEN is not present. One minor inconsistency: the registry summary lists no required config paths, but the skill frontmatter declares a config path (~/.config/nemovideo/). This mismatch should be clarified.

✓

Persistence & Privilege

The skill is not always-enabled and does not request elevated or persistent system privileges. It does not modify other skills or system-wide settings according to the provided instructions.

Assessment

This skill appears to be what it says: a cloud video/caption extraction front-end that calls nemovideo.ai and requires an API token. Before installing or using it: (1) Only provide a NEMO_TOKEN if you trust nemovideo.ai — the token grants access to upload and process your videos. (2) Avoid uploading sensitive files; the agent will accept and send local files to the remote API. (3) Note the SKILL.md can obtain an anonymous token automatically if no NEMO_TOKEN is present (it will call the public anonymous-token endpoint), so lack of a token does not prevent external network calls. (4) Ask the skill author to resolve the small metadata mismatch (registry says no config paths but the SKILL.md frontmatter lists ~/.config/nemovideo/). If those points are acceptable, the skill's behavior is coherent with its stated purpose.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📝 Clawdis

EnvNEMO_TOKEN

Primary envNEMO_TOKEN

latestvk97ckjbtdev20cq9t66tac6pm585n977

40downloads

0stars

1versions

Updated 15h ago

v1.0.0

MIT-0

Getting Started

Send me your YouTube video URL and I'll handle the caption extraction download. Or just describe what you're after.

Try saying:

"extract a 10-minute YouTube tutorial video into a 1080p MP4"
"download the captions from this YouTube video as a text file"
"downloading captions from YouTube videos for YouTubers"

Getting Connected

Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

Generate a UUID as client identifier
POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

YouTube Video Caption Download — Extract and Download Video Captions

This tool takes your YouTube video URL and runs caption extraction download through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 10-minute YouTube tutorial video and want to download the captions from this YouTube video as a text file — the backend processes it in about 10-30 seconds and hands you a 1080p MP4.

Tip: auto-generated captions may have errors — review before publishing.

Matching Input to Actions

User prompts referencing youtube video caption download, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Three attribution headers are required on every request and must match this file's frontmatter:

Header	Value
`X-Skill-Source`	`youtube-video-caption-download`
`X-Skill-Version`	frontmatter `version`
`X-Skill-Platform`	auto-detect: `clawhub` / `cursor` / `unknown` from install path

Include Authorization: Bearer <NEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up credits in your account"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "download the captions from this YouTube video as a text file" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility.

Common Workflows

Quick edit: Upload → "download the captions from this YouTube video as a text file" → Download MP4. Takes 10-30 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Comments

Loading comments...