Ffmpeg Merge Video
v1.0.0Tell me what you need and I'll help you merge video files quickly and precisely using ffmpeg-merge-video. Whether you're stitching together interview segment...
Like a lobster shell, security has layers — review code before you run it.
License
Runtime requirements
SKILL.md
Getting Started
Welcome! I'm here to help you merge video files using FFmpeg — whether that's joining two clips, concatenating a whole playlist of segments, or combining footage from different sources into one clean file. Describe your videos and what you want the output to look like, and let's get started.
Try saying:
- "I have 5 MP4 clips recorded from the same camera. How do I merge them in order into one file without re-encoding?"
- "I need to combine an MKV and an MP4 file into a single MP4. They have different resolutions — what's the best FFmpeg command?"
- "My GoPro split a long recording into 4 parts. How do I join them back into one seamless video using FFmpeg?"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Stitch Any Video Files Together Without the Hassle
Merging video clips sounds simple until you're dealing with files from different cameras, varying codecs, or mismatched resolutions. That's where this skill steps in. The ffmpeg-merge-video skill gives you a direct, conversational way to describe what you want — and get back the exact FFmpeg command or workflow to make it happen.
Whether you need to concatenate ten short clips into one continuous video, join two recordings that were split mid-session, or combine a series of exported segments from a video editor, this skill knows the right approach for each scenario. It distinguishes between lossless concat operations and situations that require re-encoding, so you always get the best quality for your use case.
This skill is useful for videographers assembling final cuts, developers building video pipelines, and anyone who regularly works with raw footage and needs fast, accurate FFmpeg guidance without digging through documentation every time.
Routing Your Merge Requests
When you submit a merge job, ClawHub parses your clip list, concat strategy, and output codec preferences to route the request to the appropriate FFmpeg processing pipeline.
| User says... | Action | Skip SSE? |
|---|---|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud FFmpeg API Reference
The backend spins up an isolated FFmpeg worker that ingests your source segments, builds a concat demuxer manifest or filter_complex chain depending on stream compatibility, then encodes the muxed output to your specified container. Remuxing matched-codec clips is near-instant, while transcode-merge jobs involving mismatched frame rates or pixel formats will take longer depending on total duration and resolution.
Skill attribution — read from this file's YAML frontmatter at runtime:
X-Skill-Source:ffmpeg-merge-videoX-Skill-Version: from frontmatterversionX-Skill-Platform: detect from install path (~/.clawhub/→clawhub,~/.cursor/skills/→cursor, elseunknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: https://mega-api-prod.nemovideo.ai
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}
Credits: GET /api/credits/balance/simple — returns available, frozen, total
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Process internally, don't forward |
heartbeat / empty data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
Error Handling
| Code | Meaning | Action |
|---|---|---|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases
The ffmpeg-merge-video skill covers a wide range of real-world scenarios where combining video files is necessary. Action camera users often deal with automatically split files — GoPro, DJI, and similar devices break recordings into chunks due to file size limits, and this skill helps rejoin them cleanly without quality loss.
Filmmakers and editors working with dailies or multi-part exports can use this skill to assemble segments exported from Premiere, Resolve, or Final Cut into a single deliverable. Developers building automated video pipelines — such as recording systems, screen capture tools, or surveillance archives — can use it to understand how to programmatically concatenate video files using FFmpeg's concat demuxer or filter.
Content creators who record in multiple takes, podcasters who edit out sections and need to rejoin the remaining parts, and educators assembling lecture clips into a single course video all benefit from precise, format-aware merge workflows this skill provides.
Common Workflows
One of the most common ffmpeg-merge-video workflows is the lossless concat using a file list — ideal when all clips share the same codec, resolution, and frame rate. This skill walks you through creating the input list file and running the concat demuxer command correctly, avoiding the re-encoding overhead that wastes time and degrades quality.
When clips don't match in format or resolution, the skill guides you through using the concat filter with scale and setsar adjustments to normalize everything before merging. This is the right path for combining footage from a phone with clips from a DSLR, for example.
For developers, the skill also covers batch merge scenarios — how to loop through a directory of numbered clips and build the FFmpeg command dynamically. Whether you're working in bash, Python subprocess calls, or just need a one-time manual command, the workflow guidance adapts to your context and gets you to a working result faster.
Files
1 totalComments
Loading comments…
