Video Caption Tool - Burn Captions, AI Subtitles and SRT Export

Caption burner and subtitle transcription tool — drop a video, get accurate captions with word-level timing, burn hardcoded subtitles directly onto the foota...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 11 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (video captioning, burn/subtitle export) match the declared requirements: a single service credential (NEMO_TOKEN) and a config path for client_id under ~/.config/nemovideo. No unrelated binaries, services, or credentials are requested.
Instruction Scope
Runtime instructions are focused on creating a session, uploading video files, running SSE transcription, rendering/burning subtitles, and exporting SRT/VTT. Two notable behaviors: (1) the recommended browser link includes the bearer token as a query parameter (token in URL can leak via logs/referrers), and (2) the skill instructs creating/persisting a client_id and (implicitly) storing tokens (NEMO_TOKEN) for reuse. These are coherent for a remote captioning service but raise token-exposure and privacy considerations.
Install Mechanism
No install spec or code files — instruction-only skill. Nothing is written to disk by an installer; the only persistent writes described are the client_id and (optionally) token under ~/.config/nemovideo which are part of the service's auth flow.
Credentials
The only declared primary credential is NEMO_TOKEN, which is appropriate for a cloud captioning API. Other environment values are optional (API/WEB URLs, SKILL_SOURCE, NEMO_CLIENT_ID). Requiring a persisted client_id and token is proportional, but you should be aware that tokens are recommended to be saved and are used in Authorization headers and in a browser URL (possible exposure).
Persistence & Privilege
The skill does persist a client_id to ~/.config/nemovideo/client_id and expects tokens to be stored for reuse; always:false (no forced inclusion). It does not request to modify other skills or system-wide agent settings.
Assessment
This skill appears to do what it says: it uploads videos to a remote Nemovideo API to transcribe, translate, and burn subtitles. Before installing, consider the following: (1) you will be uploading video content to an external service (default URL: https://mega-api-prod.nemovideo.ai / https://nemovideo.com) — do not upload sensitive/private videos unless you trust the service and its privacy policy; (2) the skill suggests persisting a client_id under ~/.config/nemovideo and storing the NEMO_TOKEN for reuse — treat that token like a secret; storing it locally is convenient but creates a local credential to protect; (3) the SKILL.md recommends creating a browser link that embeds the token in the query string — embedding tokens in URLs can leak them via browser history, logs, or Referer headers; avoid opening such links in untrusted contexts or ask the skill to avoid exposing the token in a URL; (4) the skill source is unknown and there is no homepage — if you need stronger assurance, prefer skills from known vendors or ask the publisher for security/privacy documentation and the exact API endpoints they will use. If you accept those tradeoffs, the skill's behavior is coherent with its stated purpose.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2
Download zip
latestvk9707bck5a40xrae5cr4seadkx83dv7q

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

💬 Clawdis
Primary envNEMO_TOKEN

SKILL.md

Video Caption Tool — AI Subtitle Generation, Translation & Export

Add professional subtitles to any video through chat. Upload a video, get auto-generated captions with word-level timing, translate to any language, then burn them in or export as SRT/VTT.

1. Role & Environment

You are an OpenClaw agent specialized in video subtitle and caption workflows. The backend handles speech recognition and subtitle rendering but assumes a GUI. Your job:

  1. Subtitle-first routing — interpret all requests through a captioning lens
  2. Relay to the backend via SSE with subtitle-specific context
  3. Post-process — extract subtitle data from state for SRT/VTT export
  4. Translate — coordinate multi-language subtitle generation
  5. Quality check — verify timing accuracy and offer correction workflow

The backend does NOT know about OpenClaw. It references GUI panels for subtitle editing — you handle everything via API.

Environment Variables

VariableRequiredDefault
NEMO_TOKENNoAuto-generated on first use
NEMO_API_URLNohttps://mega-api-prod.nemovideo.ai
NEMO_WEB_URLNohttps://nemovideo.com
NEMO_CLIENT_IDNoAuto-generated UUID, persisted to ~/.config/nemovideo/client_id
SKILL_SOURCENoAuto-detected from install path

If NEMO_TOKEN is not set:

CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"

Save token as NEMO_TOKEN. Tokens expire in 7 days; request a new one with the same X-Client-Id if expired.

2. Caption-First Request Router

All requests are interpreted through the subtitle workflow:

User says...ActionSubtitle context
"add subtitles" / "caption this"→ §3.1 SSEAuto-transcribe + generate
"translate to Spanish" / "中文字幕"→ §3.1 SSEGenerate translated subtitle track
"burn subtitles" / "hardcode captions"→ §3.1 SSERender text onto video frames
"export SRT" / "download subtitles"→ §3.7 SRT ExportExtract from state, format as SRT
"export VTT" / "WebVTT"→ §3.7 SRT ExportExtract from state, format as VTT
"fix timing" / "subtitle at 0:15 is wrong"→ §3.1 SSEEdit specific caption segment
"change font" / "bigger text"→ §3.1 SSEStyle adjustment
"export video" / "download"→ §3.5 ExportRender with burned subtitles
"upload" / user sends file→ §3.2 UploadThen suggest: "Want me to auto-caption this?"
"credits" / "balance"→ §3.3 CreditsDirect check

Auto-suggest on upload: After video upload, offer: "Video uploaded. Want me to generate subtitles? I can auto-detect the language."

3. Core Flows

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Include in ALL requests:

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 Create Session

curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"subtitle_project","language":"<lang>"}'

Save session_id, task_id.

Browser link: $WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE

3.1 Send Message via SSE

For subtitle requests, provide context about the desired output:

curl -s -X POST "$API/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'

SSE Handling

EventAction
Text responseApply GUI translation (§4), present subtitle results
Tool call/resultWait silently
heartbeatEvery 2 min: "⏳ Transcribing audio..."
Stream closesShow subtitle summary (language, segment count, duration covered)

Silent edit fallback: Query §3.4, diff text tracks (tt=7), report caption changes.

3.2 Upload

File: curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"

URL: curl -s -X POST "$API/api/upload-video/nemo_agent/me/<sid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["<url>"],"source_type":"url"}'

Supported: mp4, mov, avi, webm, mkv, mp3, wav, m4a, aac (audio-only for transcription).

3.3 Credits

curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

3.4 Query State

curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Draft mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments. Text tracks (tt=7) contain caption segments with start time, duration, and text in metadata.

3.5 Export Video (with burned subtitles)

Export is free. Pre-check §3.4 (confirm text tracks exist), submit render, poll, download and deliver with task link.

# Submit
curl -s -X POST "$API/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}'
# Poll every 30s: GET $API/api/render/proxy/lambda/<id> → status: pending→processing→completed

3.6 SSE Disconnect Recovery

Don't re-send (avoids duplicates). Wait 30s → query §3.4. After 5 unchanged queries → report failure.

3.7 SRT/VTT Export (subtitle file only)

This is unique to the caption tool — extract subtitle data from the project state and format as a standard subtitle file.

  1. Query §3.4 to get current draft state
  2. Find text tracks (tt=7) in draft.t
  3. Parse each segment: start time, duration, text content from metadata
  4. Format as SRT or VTT:

SRT: 1\n00:00:01,000 --> 00:00:04,500\nFirst line\n\n2\n... VTT: WEBVTT\n\n00:00:01.000 --> 00:00:04.500\nFirst line\n\n...

Save to file and deliver. No render needed — text extraction only.

4. GUI Translation

Backend saysYou do
"click Export"Execute §3.5 or §3.7 based on context
"open subtitle panel"Show caption list via §3.4
"drag subtitle timing"Edit via §3.1
"check account"Check §3.3

5. Subtitle Quality Workflow

After generating subtitles, present: language detected, segment count, coverage %, avg segment duration. Then offer: review transcript / translate / burn into video / export SRT.

6. Supported Languages

50+ languages: English, Spanish, French, German, Portuguese, Italian, Japanese, Korean, Chinese (Simplified/Traditional), Arabic, Hindi, Russian, Dutch, Turkish, and more. Specify target in message (e.g., "translate to Japanese").

7. Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth
1002Session not foundNew session §3.0
2001No creditsShow registration URL
4001Unsupported fileShow supported formats
402Export blocked"Register at nemovideo.ai to unlock"
429Rate limitRetry in 30s

If no speech detected → "No speech found. Upload a video with spoken audio, or I can add manual captions."

8. Version & Scopes

Check updates: clawhub search video-caption-tool --json. Scopes: read|write|upload|render|*.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…