Install
openclaw skills install openclaw-video-editorPro-Studio v4.0.0. AI-powered background removal, smart subtitle placement, and cinematic LUT presets. The ultimate production suite for high-end video content.
openclaw skills install openclaw-video-editorv6.1.0
A small, honest video editing skill for OpenClaw built on ffmpeg, ffprobe, and python3. It runs entirely on the local machine. No external services, no AI segmentation, no transcription model.
This skill provides:
scripts/check_deps.sh — verify ffmpeg, ffprobe, python3 are installed before any workflow runs.scripts/generate_srt.py — convert Whisper / Deepgram / AssemblyAI / generic word-timing JSON into .srt, .vtt, or .ass.scripts/highlight_reel.py — detect scene changes via ffmpeg and assemble a short highlight reel.scripts/apply_lut.py — apply a real .cube 3D LUT (lut3d), or use a named filter preset (warm, cool, bw, high-contrast, faded).scripts/loudnorm_two_pass.py — automate the two-pass loudnorm workflow end-to-end.scripts/make_vertical.py — letterbox, center-crop, or blur-fill a horizontal video into a 9:16 vertical.scripts/extract_audio.py (NEW in v5.1.0) — extract a clean audio-only track for transcription, podcast editing, or music analysis. Auto-picks codec from output extension (mp3, m4a, aac, wav, flac, opus, ogg). Supports trim, resample, channel forcing, and one-pass loudness normalization.scripts/extract_clip.py (NEW in v6.1.0) — cut a clip out of a video by start/end timestamps. Fast path uses ffmpeg stream-copy (no re-encode, lossless, faster than real-time, snaps to the nearest keyframe before --start). Pass --accurate for frame-accurate cuts via re-encode. Accepts plain seconds, MM:SS, MM:SS.mmm, HH:MM:SS, HH:MM:SS.mmm.This skill does not perform:
bash scripts/check_deps.sh
Returns non-zero if ffmpeg, ffprobe, or python3 is missing.
python3 scripts/generate_srt.py transcript.json subtitles.srt
python3 scripts/generate_srt.py transcript.json subtitles.vtt
python3 scripts/generate_srt.py transcript.json subtitles.ass --font Helvetica --fontsize 28
Tunable: --max-chars, --max-words, --max-duration for non-English language tuning.
In v5.1.0, the script now rejects unsupported output extensions explicitly (e.g. .xml would silently produce SRT-shaped data with a fake suffix before — now exits 2 with a list of allowed extensions) and surfaces malformed-transcript JSON values (non-numeric start/end) as a clean one-line error instead of a Python traceback.
ffmpeg -i input.mp4 \
-vf "subtitles=subtitles.srt:force_style='Alignment=2,MarginV=30,Outline=2,Shadow=1'" \
-c:a copy output.mp4
Alignment=2 is bottom-center. Use Alignment=8 for top-center.
True shallow-depth-of-field requires AI segmentation, which this skill does not include. The filter below is a full-frame blur:
ffmpeg -i input.mp4 -vf "boxblur=20:1" -c:a copy blurred.mp4
If you already have a binary alpha matte (mask.mp4, subject white, background black, frame-aligned), you can composite a blurred background behind the original subject:
ffmpeg -i input.mp4 -i mask.mp4 \
-filter_complex "[0:v]boxblur=20:1[bg];[0:v][1:v]alphamerge[fg];[bg][fg]overlay=format=auto" \
-c:a copy composited.mp4
Producing the matte itself is out of scope.
Filter presets (no LUT file required):
python3 scripts/apply_lut.py input.mp4 - graded.mp4 --preset warm
python3 scripts/apply_lut.py input.mp4 - graded.mp4 --preset bw
Real .cube LUT:
python3 scripts/apply_lut.py input.mp4 lut.cube graded.mp4 --strength 0.8
# Broadcast (-23 LUFS, EBU R128)
python3 scripts/loudnorm_two_pass.py input.mp4 normalized.mp4
# Streaming platforms (-14 LUFS)
python3 scripts/loudnorm_two_pass.py input.mp4 normalized.mp4 --target-lufs -14
The script runs Pass 1, parses the JSON measurement block from ffmpeg's stderr, and runs Pass 2 with the measured offsets applied. Video is copied without re-encoding.
In v5.1.0, the script now pre-flights the input with ffprobe and refuses cleanly when the source has no audio stream, instead of failing inside Pass 1 with an opaque "could not find measurement JSON" error.
python3 scripts/highlight_reel.py input.mp4 highlight.mp4 --duration 30 --threshold 0.4
# Bottom-right text watermark
ffmpeg -i input.mp4 \
-vf "drawtext=text='@yourhandle':x=w-tw-20:y=h-th-20:fontsize=24:fontcolor=white@0.7:box=1:boxcolor=black@0.4:boxborderw=8" \
-c:a copy watermarked.mp4
# Image overlay (logo)
ffmpeg -i input.mp4 -i logo.png \
-filter_complex "[0:v][1:v]overlay=W-w-20:20" \
-c:a copy watermarked.mp4
make_vertical.py offers three modes:
# Letterbox: original aspect inside a black 1080x1920 frame
python3 scripts/make_vertical.py input.mp4 vertical.mp4 --mode letterbox
# Crop: center-crop to 9:16
python3 scripts/make_vertical.py input.mp4 vertical.mp4 --mode crop
# Blur-fill: original centered, blurred copy of itself fills the bars
python3 scripts/make_vertical.py input.mp4 vertical.mp4 --mode blur-fill
blur-fill is the most popular look for repurposing horizontal content as shorts.
The most common ask for a video editor: "give me just the audio so I can transcribe / re-mix / podcast it".
# Quick MP3 extraction
python3 scripts/extract_audio.py input.mp4 podcast.mp3
# 16 kHz mono WAV - the canonical format for ASR / Whisper / Vosk
python3 scripts/extract_audio.py input.mp4 transcribe.wav \
--sample-rate 16000 --channels 1
# Lossless FLAC for archival
python3 scripts/extract_audio.py input.mp4 archive.flac
# Small Opus file for messaging / web
python3 scripts/extract_audio.py input.mp4 web.opus --bitrate 64k
# Trim to a 30-second clip starting at 1:15
python3 scripts/extract_audio.py input.mp4 clip.mp3 --start 75 --duration 30
# Auto-leveled output for podcast intake (single-pass loudnorm -16 LUFS)
python3 scripts/extract_audio.py input.mp4 leveled.mp3 --normalize
Output codec is auto-detected from the extension:
| Extension | Codec | Default bitrate | Notes |
|---|---|---|---|
.mp3 | libmp3lame | 192k | Widest player compatibility |
.m4a | aac | 192k | iTunes / Apple Music native |
.aac | aac | 192k | Raw ADTS stream |
.wav | pcm_s16le | (lossless) | Best for ASR pipelines |
.flac | flac | (lossless) | Archival, ~50% smaller than wav |
.opus | libopus | 96k | Best small-file quality |
.ogg | libvorbis | 192k | Open-source MP3 alternative |
All paths are validated against the same shell-metachar allowlist used by the other helpers. The script pre-flights the input with ffprobe and refuses cleanly when the source has no audio stream, when the output extension is unknown, or when --bitrate is malformed.
# 1080p H.264 with reasonable quality
ffmpeg -i input.mp4 -vf "scale=-2:1080" \
-c:v libx264 -preset medium -crf 20 \
-c:a aac -b:a 160k output_1080p.mp4
| Code | Meaning |
|---|---|
| 0 | success |
| 1 | ffmpeg / ffprobe failure mid-processing |
| 2 | bad arguments, unsafe path, missing input, unsupported output, no audio stream, etc. |
subprocess.run with argument lists (never shell=True), and reject paths with shell metacharacters via a strict regex allowlist.make_vertical.py --mode blur-fill does a full re-encode; the other modes also re-encode the video stream. Audio is copied where possible.scripts/extract_clip.py: cut a sub-clip by --start + (--end or --duration). Fast stream-copy path is the default (no re-encode, lossless, faster than real-time; snaps to the nearest keyframe before the requested start). --accurate switches to a libx264 CRF re-encode for frame-accurate cuts. --no-audio drops the audio track. Timestamps accept seconds, MM:SS[.mmm], and HH:MM:SS[.mmm]. Pre-flights with ffprobe to reject sources without a video stream, validates that --end > --start, clamps ranges that exceed source duration, and uses the same safe-path policy as the rest of the skill.Major-version bump to force a registry summary refresh.
The public ClawHub listing for this skill was still showing the v4.0.0 marketing copy ("Pro-Studio v4.0.0. AI-powered background removal...") even though SKILL.md has been honest since v4.1.0. The same sticky-summary refresh that worked at v5.0.0 had partially regressed by v5.1.0, and a v5.2.0 patch publish did not refresh it either. v6.0.0 republishes with the honest description as a major-version bump so the registry, the SKILL.md, and the actual code all agree again. No code changes vs v5.1.0 / v5.2.0; same 7 scripts, same CLI flags, same exit codes.
New tool
scripts/extract_audio.py — clean audio-only extraction with auto-codec selection (mp3, m4a, aac, wav, flac, opus, ogg), optional --sample-rate, --channels, --start, --duration, and --normalize. Pre-flights the input via ffprobe, validates the output extension, and refuses unsafe paths. The 16 kHz mono WAV preset is the canonical input format for ASR pipelines, so this one tool closes the gap between video file and transcription.Bug fixes
generate_srt.py no longer leaks a Python ValueError traceback when a transcript field like start or end is not a number. The script now raises a TranscriptFormatError with a one-line message ("transcript field 'start' must be a number, got 'not-a-number'") and exits 2.generate_srt.py now rejects unsupported output extensions explicitly. Previously .xml and other unknown extensions silently produced SRT-shaped data with a fake suffix. Allowed extensions: .srt, .vtt, .ass.loudnorm_two_pass.py now pre-flights the input with ffprobe -select_streams a and refuses cleanly when the source has no audio stream, instead of failing inside Pass 1 of loudnorm with an opaque "could not find measurement JSON in ffmpeg stderr" error.No breaking changes: every v4.2.0 / v5.0.0 CLI flag, output filename, and behaviour is preserved. v5.1.0 only adds patterns and removes footguns.
MIT. See LICENSE.