Capcut Subtitle Generator

v1.0.0

Tired of manually typing out every word in your videos just to add subtitles? The capcut-subtitle-generator skill handles the heavy lifting — turning your vi...

0· 74·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dsewell-583h0/capcut-subtitle-generator.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Capcut Subtitle Generator" (dsewell-583h0/capcut-subtitle-generator) from ClawHub.
Skill page: https://clawhub.ai/dsewell-583h0/capcut-subtitle-generator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install capcut-subtitle-generator

ClawHub CLI

Package manager switcher

npx clawhub@latest install capcut-subtitle-generator
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The skill claims to generate CapCut-ready subtitles and its instructions require a cloud speech/processing API, an API token (NEMO_TOKEN), session creation, uploads, SSE streaming and render/export endpoints — all coherent with the described purpose. One minor inconsistency: the top-level registry metadata listed no required config paths, but the SKILL.md frontmatter includes a configPaths entry (~/.config/nemovideo/), which suggests the skill may read or expect local NemoVideo config files.
Instruction Scope
Instructions direct the agent to check for NEMO_TOKEN, automatically obtain an anonymous token if missing, create a session, upload user-supplied video files or URLs, stream SSE messages, poll renders, and download outputs. These actions are expected for a cloud captioning flow, but they do involve sending user video/audio and transcripts to an external service and reading/install-path heuristics (~/.clawhub, ~/.cursor/skills, ~/.config/nemovideo/) to set headers — review whether you are comfortable with automatic network calls and any local path reads.
Install Mechanism
This is instruction-only (no install spec, no code files), so nothing is written to disk by an installer. That minimizes install-time risk.
Credentials
The only declared required credential is NEMO_TOKEN, which matches the external API used. The skill will attempt to mint an anonymous token via the described endpoint if NEMO_TOKEN is not present; this is reasonable but means the agent will make outbound auth calls automatically. The SKILL.md implies storing/keeping session_id and treating data.token as NEMO_TOKEN — clarify how/where tokens/session IDs are stored and for how long (env, config path, memory).
Persistence & Privilege
The skill is not marked always:true and does not request system-wide privileges. Autonomous model invocation is enabled (default) which is normal; there is no install-time persistence or modification of other skills described.
Assessment
This skill behaves like a normal cloud subtitle service: it will upload video/audio/transcripts to an external API (mega-api-prod.nemovideo.ai), create or use a NEMO_TOKEN for authorization, and store a session_id for processing. Before installing, consider: 1) Privacy: uploaded media leaves your machine — don't send sensitive content unless you trust the service and its retention policy. 2) Token handling: the skill can mint an anonymous token if none is present; ask how tokens and session IDs are stored and how to revoke them. 3) Costs/limits: anonymous tokens mention free credits and expiry — confirm rate limits or billing if you’ll process many files. 4) Config path mismatch: SKILL.md references ~/.config/nemovideo/ and detects install paths (~/.clawhub, ~/.cursor) — confirm whether the skill will read those local paths and whether you’re comfortable with that. 5) Verify the API hostname (mega-api-prod.nemovideo.ai) if you need an official vendor or privacy/SLA assurances. If any of these points are unacceptable or unclear, seek more documentation from the skill author or avoid enabling the skill.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎬 Clawdis
EnvNEMO_TOKEN
Primary envNEMO_TOKEN
latestvk9748rsfzx481bqengj1yv4rmd85eqgg
74downloads
0stars
1versions
Updated 3d ago
v1.0.0
MIT-0

Getting Started

Welcome! Let's get your CapCut video subtitled quickly and cleanly — just share your script, transcript, or video dialogue and I'll format it into caption-ready text for CapCut. Ready to save hours? Drop your content below and let's get started!

Try saying:

  • "Format my transcript for CapCut captions"
  • "Break dialogue into short subtitle lines"
  • "Generate captions for my video script"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Generate a UUID as client identifier
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

Turn Your Words Into Perfectly Timed CapCut Captions

Adding subtitles to your CapCut videos shouldn't mean hours of manual typing, timestamp-hunting, and formatting frustration. The CapCut Subtitle Generator skill transforms your raw transcripts, dialogue scripts, or spoken content into clean, properly structured subtitle text that slots directly into your CapCut editing workflow — no guesswork, no tedious line-by-line syncing.

Whether you're producing short-form content for TikTok, YouTube Shorts, or Instagram Reels, subtitles are no longer optional — they're essential for accessibility, watch time, and engagement. This skill generates caption blocks optimized for CapCut's subtitle editor, giving you a ready-to-use foundation that you can fine-tune in just minutes instead of starting from scratch.

Ideal for solo creators, small video production teams, and social media managers juggling multiple projects, this skill handles the repetitive formatting so you can focus on the creative side. Feed it your content, tell it your preferred style, and get subtitle-ready output that makes your CapCut sessions dramatically faster and more productive.

Routing Caption Requests Intelligently

When you submit a video or transcript, ClawHub parses your subtitle style preferences, language settings, and caption timing parameters to route your request to the optimal CapCut-compatible processing pipeline.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Auto-Caption API Reference

CapCut Subtitle Generator runs on a cloud-based speech recognition and frame-sync engine that transcribes audio, segments caption blocks, and exports SRT or embedded subtitle files aligned to your video timeline. Processing speed scales with clip length and selected caption style — animated karaoke burns take longer than standard static captions.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: capcut-subtitle-generator
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

EventAction
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend saysYou do
"click [button]" / "点击"Execute via API
"open [panel]" / "打开"Query session state
"drag/drop" / "拖拽"Send edit via SSE
"preview in timeline"Show track summary
"Export button" / "导出"Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth via anonymous-token (tokens expire after 7 days)
1002Session not foundNew session §3.0
2001No creditsAnonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account"
4001Unsupported fileShow supported formats
4002File too largeSuggest compress/trim
400Missing X-Client-IdGenerate Client-Id and retry (see §1)
402Free plan export blockedSubscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429Rate limit (1 token/client/7 days)Retry in 30s once

Troubleshooting Common Subtitle Issues in CapCut

If your generated subtitles look misaligned or run too long when pasted into CapCut, the most common cause is overly long lines in the source transcript. Try re-submitting with a note to break lines at 5-6 words maximum, and the output will be much easier to sync manually in CapCut's timeline.

If CapCut's auto-sync feature isn't picking up your pasted captions correctly, double-check that you're using CapCut's 'Add Text' or 'Auto Captions' section rather than pasting into a generic text layer. The subtitle generator output is formatted for CapCut's caption workflow specifically.

For videos with heavy background noise, slang, or overlapping speech, your transcript may have gaps or errors before you even submit it. Clean up those sections manually before running them through the skill — accurate input always produces more usable subtitle output. If you're unsure about a section, flag it in your prompt and the skill can suggest placeholder caption text.

Best Practices for CapCut Subtitle Generation

For the cleanest results, provide your transcript or script as plain text with natural sentence breaks already in place. The more accurately your input reflects what's actually spoken in the video, the better your subtitle output will align when you paste it into CapCut's subtitle editor.

Keep individual caption lines short — aim for no more than 6 to 8 words per line where possible. This makes captions easier to read on mobile screens, which is where most CapCut content gets watched. When submitting your content, mention the video's pace (fast-talking, slow narration, interview-style) so the subtitle formatting can be adjusted accordingly.

If your video includes multiple speakers, label each speaker clearly in your input (e.g., 'Speaker 1:' and 'Speaker 2:'). This helps produce subtitle blocks that are easier to manage inside CapCut and reduces the editing time you'll spend after import.

Comments

Loading comments...