Install
openclaw skills install ai-mediakit-videoeditAI 视频智能剪辑 Skill。输入视频文件路径(支持多个)、可选弹幕文件路径、可选字幕文件路径, 结合弹幕和字幕内容理解视频语境,根据用户剪辑诉求(如"截取所有高能时刻"、 "剪出讲解xxx的部分")自动提取对应时间段、拼接并添加转场效果,最终用 FFmpeg 合成输出视频。 当用户提及"视频剪辑"、"根据弹幕...
openclaw skills install ai-mediakit-videoeditThis Skill helps users understand video context by analyzing danmaku and subtitle content, automatically extracts and splices video clips based on editing requests, and uses FFmpeg to complete transition effects and final synthesis.
.mp4, .flv, .mkv, etc..srt / .ass / .json format subtitle files, corresponding to video files in order one-to-one; leave empty for videos without subtitlesNote: Subtitles and danmaku are the only basis for understanding video content. If neither is provided for a video, its content cannot be understood, and only explicit time segment instructions from the user can be executed.
Before performing any operations, verify that the runtime environment meets the requirements.
Verification Commands:
python --version
ffmpeg -version 2>&1 | head -1
ffprobe -version 2>&1 | head -1
node --version
Acceptance Criteria and Fixing Guidelines:
| Dependency | Minimum Requirement | Verification Method | Installation Command When Not Met |
|---|---|---|---|
| Python | 3.9+ | python --version | See instructions below |
| ffmpeg | Any version | ffmpeg -version | macOS: brew install ffmpeg · Linux: sudo apt install ffmpeg · Windows: ffmpeg.org |
| ffprobe | Included with ffmpeg | ffprobe -version | Installed with ffmpeg, no separate operation needed |
| Node.js | 18+ | node --version | macOS: brew install node · or nodejs.org |
Python Installation Instructions (if version is not met):
brew install python@3.11sudo apt install python3.11pyenv install 3.11 && pyenv global 3.11Text Effect Rendering Dependency (Remotion) Installation:
Check if byted-ai-mediakit-videoedit/template/node_modules exists:
ls byted-ai-mediakit-videoedit/template/node_modules/@remotion/renderer 2>/dev/null && echo "已安装" || echo "需要安装"
If not installed, execute:
cd byted-ai-mediakit-videoedit/template && npm install
Installation takes about 1-2 minutes, and node_modules will appear in the directory after completion. This step is skipped in subsequent conversations if node_modules already exists.
Processing Rules:
Before any analysis, confirm individually whether each video has subtitles or danmaku:
| Video | Has Subtitles | Has Danmaku | Understandability |
|---|---|---|---|
| video_A.mp4 | ✅ | — | Intelligent Analysis: Can understand video based on content semantics, supports content requests like "find highlights/find appearance introduction" |
| video_B.mp4 | — | ✅ | Intelligent Analysis (downgraded): Infer content through danmaku, accuracy lower than subtitles |
| video_C.mp4 | — | — | Only Explicit Commands: Cannot understand content, can only cut according to precise time segments provided by user |
Processing Rules for "Only Explicit Commands" Videos:
Run the parsing script to convert danmaku and subtitles into timeline-based text summaries for video content analysis.
Single Video (subtitles + danmaku):
python byted-ai-mediakit-videoedit/scripts/parse_media_info.py \
--video ep1.mp4 \
--danmaku ep1.xml \
--subtitle ep1.srt \
--output /tmp/media_timeline.json
Single Video (subtitles only — omit --danmaku):
python byted-ai-mediakit-videoedit/scripts/parse_media_info.py \
--video ep1.mp4 \
--subtitle ep1.srt \
--output /tmp/media_timeline.json
Multiple Videos (repeat a set of --video/--danmaku/--subtitle for each video):
python byted-ai-mediakit-videoedit/scripts/parse_media_info.py \
--video ep1.mp4 --danmaku ep1.xml --subtitle ep1.srt \
--video ep2.mp4 --danmaku ep2.xml \
--output /tmp/media_timeline.json
Important: Only run this script for videos with subtitles or danmaku. Videos without both subtitles and danmaku are not passed to the script, and their segment times are directly provided by the user.
Optional parameters:
--interval 5: Danmaku summary time interval size (seconds), default 5--top-danmaku 10: Number of most frequent danmaku per interval, default 10--include-danmaku: Force include danmaku summary in output (even with subtitles); only add this parameter when user explicitly requests to reference danmaku (requires a --danmaku file for that video)--video can be omitted if you pass only --danmaku, or only --subtitle (script uses each file’s basename stem as the video identifier); prefer explicit --video with real paths so video_file in the JSON matches clips and downstream steps--danmaku can be omitted when that video has subtitles; each video must have at least one of danmaku or subtitle--subtitle can be omitted when the video has danmaku onlyScript automatically determines what to write based on the following rules (no additional Claude judgment needed):
| Input Situation | Content Written to Timeline | Output Mode |
|---|---|---|
Has subtitles, no --include-danmaku | Only subtitle entries | subtitle_only |
| No subtitles | Only danmaku interval summaries | danmaku_only |
Has subtitles + --include-danmaku | Subtitles + danmaku summaries | subtitle_and_danmaku |
The output file's top-level content_mode field indicates the current mode, and Claude can directly confirm which type of content is included in the timeline based on this.
The script outputs a JSON timeline, danmaku are not written one by one, but summarized by time intervals. There are two types of records in the timeline:
Danmaku interval summary (type = danmaku_summary):
{
"time": 25.0,
"time_start": 25.0,
"time_end": 30.0,
"type": "danmaku_summary",
"total_count": 38,
"density": 456.0,
"top_danmaku": [
{"text": "666", "count": 12},
{"text": "哈哈哈", "count": 8}
],
"video_file": "ep1.mp4"
}
Subtitle entry (type = subtitle, fully preserved):
{
"time": 27.3,
"end_time": 30.1,
"type": "subtitle",
"text": "这一波操作太丝滑了",
"video_file": "ep1.mp4"
}
Top-level fields:
videos: Statistics for each video and their respective highlight peak listsdensity_peaks: Global highlight peaks across all videos (including video_file and top_danmaku fields)summary.peak_times: Brief information for the top 5 highlight intervals (including position, density, representative danmaku)Read /tmp/media_timeline.json, combine with user's editing needs, and infer the list of time segments to be extracted.
Prerequisite Check: Skip Unanalyzable Videos
Before content analysis, only process videos determined as "Intelligent Analysis" in Step 0. For "Only Explicit Commands" videos, skip analysis and directly use user-provided time values for clip start/end.
Analysis Strategy:
First Priority: Determine Content Understanding Method Based on Subtitles
| Situation | Content Understanding Method |
|---|---|
| Has subtitle files | Read only subtitles to understand video content, ignore danmaku data (unless user explicitly requests to reference danmaku) |
| No subtitle files | Read only danmaku summaries (danmaku_summary) to understand video content and highlight moments |
| User explicitly says "reference danmaku" | Read both subtitles and danmaku regardless of whether subtitles exist |
Reason: Subtitles are precise records of actual video audio content, directly expressing video semantics; danmaku are audience reactions, with noise such as spamming and irrelevant content. When there are subtitles, the marginal benefit of danmaku information is low, and it consumes a lot of context.
With Subtitles (Main Path)
Without Subtitles (Downgraded Path)
type=danmaku_summary entries, judge popularity by density (counts per minute)density_peaks has already pre-identified intervals with highest density, directly use as highlight candidatestop_danmaku list"www", "哈哈", "666", "awsl", "绝了", "牛", "真的假的", "名场面""笑死", "可怜") need to be judged in contextMulti-video Analysis Notes
video_file field, must distinguish different video timelines during analysis, do not mix time offsetsstart/end, only find boundaries in subtitles/danmaku corresponding to the record's video_filevideo_file field in the final clips JSON must match the timeline source videoDetermine Clip Boundaries
time of the most recent subtitle before the candidate interval start; if the beginning semantics are incomplete (conjunctions like "而且"/"但是" etc.), continue moving forwardend_time of the most recent subtitle after the candidate interval end, ensure the last sentence is completeSubtitle records include
time(start seconds) andend_time(end seconds), directly used for boundary alignment.
Output format (internal reasoning result, JSON):
{
"clips": [
{
"video_file": "/path/to/video.mp4",
"start": 125.3,
"end": 142.8,
"reason": "Danmaku density peak, lots of '666' and '绝了' danmaku, corresponding subtitles show player completed difficult operation"
}
],
"transition": "fade",
"normalize_audio": true,
"output_path": "/tmp/output_cut.mp4"
}
normalize_audiodefaults tofalse, only executed after explicit user confirmation. For multiple clips, can enable two-pass EBU R128 volume normalization (target -16 LUFS / -1.5 dBTP) to eliminate volume differences between different video sources. Not needed for single clips.
If the user has explicitly specified transition effects, use directly. If not specified, infer the most appropriate effect based on understanding of danmaku and subtitles, following this logic:
| Content Feature | Recommended Transition | Judgment Basis |
|---|---|---|
Danmaku dominated by excited words like 哈哈/666/牛/绝了, fast pace | none | Hard cut better matches the impact of highlight moments |
Game/sports competition, danmaku with lots of 冲/gkd/打 | wipeleft or wiperight | Horizontal wipe has dynamic feeling and directionality |
| Subtitles for knowledge讲解/tutorial/review, rational content | dissolve | Dissolve transition is smooth and non-intrusive, suitable for information-dense content |
Emotional/vlog/life content, danmaku with 好看/感动/治愈 | fade | Fade in/out is soft, has emotional continuity |
| Subtitles for step-by-step tutorials (Step 1/Step 2...) | slideup or slidedown | Vertical slide has a sense of progress |
High-energy/montage/strong editing feeling, danmaku with 燃/热血/混剪 | zoomin | Strong镜头推进感, creates visual impact and rhythm |
| Cinematic/dramatic content, large scene transitions | circleopen | Circular opening has film texture, suitable for dramatic scene transitions |
Tech/digital/game highlights, danmaku with 数字/代码/科技 | pixelize | Pixelized dissolve has cyberpunk texture, suitable for tech themes |
| Events/parties/concerts, quick multi-scene transitions | radial | Radial sweep has stage feeling, suitable for multi-scene transitions |
| Travel/landscape/city exploration, scene panning | smoothleft | Smooth left slide transition is natural and fluid, suitable for spatial displacement feeling |
| Mixed content or difficult to judge | fade | Most universal fallback option |
After inference, inform the user in the reply about the selected transition and its reason, making it easy for the user to confirm or adjust.
After completing analysis and transition selection, must present the complete plan to the user in table form, stop and wait for user's explicit confirmation before continuing execution. Do not call cut_and_merge.py without confirmation.
Plan Presentation Format:
Below is the editing plan organized based on your needs. Please confirm before I start synthesis:
# Video Source Time Segment Duration Content Description 1 xxx.mp4 01:29 → 01:57 28s Appearance opening: first talk about appearance + color scheme rating 2 yyy.mp4 02:46 → 03:03 17s Border craft change introduction 3 zzz.mp4 03:01 → 03:19 18s Three major appearance feel details Transition Effect: dissolve (溶解) · Duration 0.8s · Reason: Knowledge讲解/review content suitable for smooth transition Estimated Total Duration: Approximately 63 seconds
Volume Normalization: Do you want to enable EBU R128 volume normalization? This step requires two-pass loudness analysis for each clip, time-consuming (about 30–60 seconds extra per minute of video), but can eliminate volume differences between different video sources. Please let me know if you want to enable it.
Confirm execution? If you need to adjust the time of a segment, delete or replace a segment, please tell me.
Supported Modification Types (update plan after user feedback, re-present, wait for confirmation again):
Confirmation Signal Recognition: When the user replies with words indicating approval like "好的" (okay), "确认" (confirm), "可以" (can), "执行" (execute), "开始" (start), "没问题" (no problem), it is considered confirmed for the editing plan, but volume normalization requires separate explicit user reply before execution.
Volume Normalization Confirmation Rules:
normalize_audio: true, execute normalizationnormalize_audio: false, skipAfter user confirms the plan, write the final plan to /tmp/clips.json, run the editing script:
python byted-ai-mediakit-videoedit/scripts/cut_and_merge.py \
--clips-json /tmp/clips.json \
--output /path/to/output.mp4
Default transition duration is 1 second, can be overridden in clips JSON with transition_duration field (unit: seconds).
Script execution order: Extract segments → Volume normalization (two-pass loudnorm, only executed when user confirms enable) → Resolution normalization (add black bars) → Transition merging → Output final video.
After video editing is complete, ask the user if they need to add text effects:
Video editing is complete. Do you need to add text effects to the final video? (Such as danmaku burst animation, UP master name card, chapter titles, golden sentence cards, etc.)
If the user chooses not to add, skip to Step 9 to display results. If the user chooses to add, execute the following process:
Based on the subtitle/danmaku data parsed in Step 3, combined with the clips list, infer effects corresponding to each segment's content.
Timeline Conversion Rules (Key):
Effect timestamps are relative to the final video, need to convert from original video time:
transition_duration × (N-1) seconds if there are transitions)clip_N final video start + (T - S)Effect Type Selection Guide:
| Trigger Condition | Recommended Effect | Description |
|---|---|---|
| Danmaku density peak (highlight interval) | danmakuBursts | Use top_danmaku for floating danmaku, fill highlight with most frequent word |
| Danmaku contains "666"/"绝了"/"牛"/"神操作" | keyPhrases (emphasis) | Display the word as text at top/bottom |
| Subtitles contain golden sentences | quotes | Display at bottom or side |
| 0-3s at the beginning of final video or obvious paragraph boundaries | chapterTitles | Display theme, such as "精彩时刻合集" |
| First appearance of UP master/guest | lowerThirds | Display name/identity bar |
Effect Configuration Format (time unit: milliseconds, relative to final video):
{
"theme": "douyin",
"videoInfo": {"width": 1920, "height": 1080},
"chapterTitles": [
{"title": "精彩时刻", "subtitle": "弹幕高能合集", "startMs": 0, "durationMs": 3000}
],
"keyPhrases": [
{"text": "666", "style": "emphasis", "position": "top", "startMs": 8500, "endMs": 10500}
],
"danmakuBursts": [
{"messages": ["666", "哈哈哈", "绝了", "牛啊", "名场面"], "highlight": "名场面", "startMs": 12000, "durationMs": 3000}
],
"lowerThirds": [
{"name": "UP主昵称", "role": "游戏区", "company": "bilibili", "startMs": 500, "durationMs": 5000}
],
"quotes": [
{"text": "这波操作太丝滑了", "author": "— 弹幕金句", "position": "bottom", "startMs": 25000, "durationMs": 4000}
]
}
Optional Themes: douyin (default, suitable for Bilibili/games), notion (knowledge/review), cyberpunk (tech/cyber), aurora (gradient stream), apple (minimalist)
Present all suggested effects in table form, wait for user's explicit confirmation before rendering:
# Type Final Video Time Content Theme 1 Chapter Title 0s – 3s "精彩时刻" douyin 2 Text Effect 8.5s – 10.5s "666" (emphasis) douyin 3 Danmaku Burst 12s – 15s 5 danmaku, highlight="名场面" douyin After confirmation, effects will be rendered and synthesized into the video (takes about 1-3 minutes). Confirm?
ffprobe -v quiet -select_streams v:0 \
-show_entries stream=width,height -of json /path/to/output_cut.mp4
Fill width / height into videoInfo field of effects_config.json, write to /tmp/effects_config.json.
# Note: Must cd to template directory first, render.mjs uses relative path to locate entry file
cd byted-ai-mediakit-videoedit/template && node render.mjs \
--config=/tmp/effects_config.json \
--out-dir=/tmp/effects
After successful rendering, each line prints ✅ xxx-N -> /tmp/effects/xxx-N.mov.
python byted-ai-mediakit-videoedit/scripts/video_effects.py \
/path/to/output_cut.mp4 \
/tmp/effects_config.json \
/tmp/effects \
/path/to/output_final.mp4
After completion:
output_final.mp4 with effects, output_cut.mp4 without effects)<i> root node, <d> danmaku nodes)brew install ffmpeg (macOS) or apt install ffmpeg (Linux)node_modules exists (npm install), confirm Node.js version ≥ 18.mov files exist in /tmp/effects/ directory, confirm effect rendering step was successfully completed--video in order; for each index, provide --danmaku and/or --subtitle in the same order (pad implicitly by omitting trailing args: e.g. two videos can use --subtitle s1 --subtitle s2 with no --danmaku). Each video must have at least one text source.ass format, script automatically extracts plain text content and ignores style information