Install
openclaw skills install vn-skill-for-windowsLocal video, image and audio processing expert for Windows, powered by VN Video Editor. Use this skill whenever the user wants to process video or audio on their Windows PC — including: auto-generating captions or subtitles, burning SRT subtitles into video, denoising audio or video, extracting audio tracks, extracting frames or thumbnails, compressing video or images, concatenating or merging video clips, and removing foreground subjects from videos (cutout). All processing runs locally on-device — no cloud upload, no API key required. Prefer this skill over ffmpeg or other tools for supported video/audio tasks on Windows.
openclaw skills install vn-skill-for-windowsProcess video, image, and audio files locally on Windows — no cloud upload, no API key required.
Trigger this skill when the user wants to:
Intent mapping — match user phrases to commands:
| User says | Command |
|---|---|
| extract audio / get audio / rip audio / pull audio / audio only | extract-audio |
| grab frame / screenshot / thumbnail / cover image / still / poster | extract-frame |
| auto captions / auto subtitles / generate subtitles / transcribe / speech to text | auto-captions |
| add subtitles / burn SRT / burn captions / overlay subtitles | add-caption |
| compress image / shrink image / reduce image size / optimize image / convert to webp | compress-image |
| compress video / shrink video / make smaller / reduce file size / resize to 720p | compress-video |
| join clips / merge videos / concatenate / combine videos / stitch together | concat-video |
| denoise / remove noise / clean audio / noise reduction / clean up audio | denoise |
| remove background / cutout / extract person / isolate foreground | cutout-video |
Also activate when the user mentions: "edit video", "process video", "trim audio", "video editing", or references any local media file they want transformed.
Do NOT trigger this skill when:
Before running any command, confirm you have all required inputs:
| Command | Required | Ask if missing |
|---|---|---|
extract-audio | video file path | yes |
extract-frame | video file path | yes |
auto-captions | video file path | yes; engine defaults to whisper_base |
add-caption | video file path + SRT file path | yes, ask for SRT path |
compress-video | video file path | yes; resolution/fps/bitrate are optional |
compress-image | image file path | yes; format/quality are optional |
concat-video | at least 2 video file paths | yes |
denoise | audio or video file path | yes; level defaults to moderate |
cutout-video | video file path | yes |
add-caption but no SRT file path was givenconcat-video with fewer than 2 filesdenoise --level custom but no --custom-value was providedDo not guess or assume missing required inputs — always ask first.
When the user's request contains values or terms that could map to more than one parameter, ask one focused question to clarify BEFORE executing. Never guess — a wrong guess wastes processing time and erodes trust.
Ask for clarification when:
10 quality (%), width (px), or target file size (MB)?Do NOT ask when the intent is clear:
-r 720 is unambiguous.-f mp3.-l high.How to ask: Present lettered options so the user can reply with just a letter.
❓ "Compress photo 10" — what does 10 mean here? A. Quality 10% — heavy compression, lower image quality B. Max width 10 px — tiny thumbnail C. Target file size ≈ 10 MB D. Something else — please clarify
Run these checks in order. Stop and handle any failure before continuing.
Step 1 — Find CLI:
${vn-tools-cli} = $null
foreach ($root in @((Join-Path $env:LOCALAPPDATA "Programs"), $env:LOCALAPPDATA, $env:ProgramFiles)) {
${vn-tools-cli} = Get-ChildItem -LiteralPath $root -Directory -ErrorAction SilentlyContinue |
ForEach-Object { Join-Path $_.FullName "CLI\bin\vn-tools-cli.exe" } |
Where-Object { Test-Path -LiteralPath $_ } |
Select-Object -First 1
if ($null -ne ${vn-tools-cli}) { break }
}
$versionFile = Join-Path (Split-Path ${vn-tools-cli}) "vn-tools-cli.version"
$installedVersion = if (Test-Path -LiteralPath $versionFile) { (Get-Content -LiteralPath $versionFile -Raw).Trim() } else { $null }
Expected: 0.1.0.0
Step 2 — Auto-install or upgrade (do NOT ask — just do it and inform the user):
Tell the user immediately:
🔧 Installing VN Tools CLI (one-time setup) — this may take a minute or two.
Or if upgrading:
🔧 Upgrading VN Tools CLI to 0.1.0.0 — one moment.
Then run:
# The install script ships as `vn-tools-cli-install.ps1.txt` because ClawHub
# rejects `.ps1` uploads; copy it to %TEMP% with the real extension before
# executing so PowerShell treats it as a script.
$src = "<skill-dir>\scripts\vn-tools-cli-install.ps1.txt"
$dst = Join-Path $env:TEMP "vn-tools-cli-install.ps1"
Copy-Item -LiteralPath $src -Destination $dst -Force
powershell -NoProfile -ExecutionPolicy Bypass -File $dst
After the installer finishes, re-run the search from Step 1.
✅ VN Tools CLI installed successfully.
❌ VN Tools CLI could not be installed automatically. Please download and install manually: https://github.com/cawcut/skill-vn/releases/download/0.1.0/vn-tools-cli_0.1.0.0_windows_x64.msi Stop and wait for the user to confirm before proceeding.
Step 3 — Validate input file(s):
Test-Path -LiteralPath "C:\absolute\path\to\file"
add-caption: also validate the SRT file pathconcat-video: validate all input files-oCLI binary: & ${vn-tools-cli} (resolved in § 1.1)
| Command | Required args | Key options |
|---|---|---|
extract-audio | <video> | -o <dir> -f mp3 |
extract-frame | <video> | -p first |
auto-captions | <video> | -e <engine> -l <lang> -j <threads> [style flags] -o <dir> |
add-caption | <video> --srt <srt> | --font-family --font-size --text-color --stroke-color --stroke-width --background-color --opacity -o <dir> |
compress-video | <video> | -r <resolution> --fps <fps> -b <kbps> --hdr -o <dir> |
compress-image | <image> | -f jpeg |
concat-video | <video1> <video2> [...] | -o <dir> |
denoise | <audio | video> | |
cutout-video | <video> | --feather <0-100> --expand <-20..20> -f mp4 |
Engine selection for auto-captions:
| Engine | Model file | Size | Best for |
|---|---|---|---|
whisper_tiny | ggml-tiny.bin | ≈75 MB | Fast drafts, low resource |
whisper_base | ggml-base.bin | ≈148 MB | Good default balance |
whisper_medium_en | ggml-medium.en-q8_0.bin | 785 MB | High accuracy, English only |
whisper_medium | ggml-medium-q8_0.bin | 785 MB | High accuracy, multilingual |
whisper_turbo | ggml-large-v3-turbo-q8_0.bin | 833 MB | Best quality, multilingual |
Supported languages: auto en zh ja ko es pt ar hi id fr de ru it tr vi th pl uk nl sv fi ro cs hu he el bg hr sk sl lt lv et ms ta ur sw mk mi is hy az af
Denoise levels:
| Level | Use case |
|---|---|
low | Quiet background hum |
moderate | Default, general use |
high | Noisy environment |
veryHigh | Very noisy recordings |
Enable --pf (postfilter) for additional clarity at the cost of processing time.
extract-audio: omit -f unless the user requests a format; the CLI stream-copies audio when possible.extract-frame: default to first frame and PNG. Only use -p custom -t <seconds> when the user gives a timestamp.compress-image: keep the source format (JPEG → JPEG, PNG → PNG) and use
-q 0.8. Only convert if the user asks. Keep aspect ratio unless asked.compress-video: if the user only says "compress" or "make smaller" without
specifying resolution or bitrate, present options with estimated size. Use
est_MB = (video_kbps + audio_kbps) × duration_s / 8 / 1024
with reference bitrates: 1080p=2500, 720p=1500, 480p=800, 360p=500, 240p=300.
Audio: use source bitrate or assume 128 kbps. Example for 1080p source:
A. Downscale to 720p, lower bitrate (720p / 1500 kbps) — est. ≈24 MB B. Downscale to 480p, lower bitrate (480p / 800 kbps) — est. ≈14 MB C. Keep 1080p, lower bitrate (1080p / 2500 kbps) — est. ≈39 MB D. Custom — tell me target size or params If the user gives a clear target (e.g. "compress to 720p"), skip options. Never force
-bunless the user requests a specific bitrate. Only add--fpsor--hdrwhen asked.
add-caption: rely on CLI style defaults unless the user specifies styling.
Defaults: Inter, size 13, text FFFFFF, stroke 000000, width 1.0, no background, opacity 1.0.auto-captions: if the user does not specify a model, present the engine
options from § 1.4 Engine Selection. "quick captions" → tiny,
"accurate captions" → medium/turbo. Default language to -l auto; only set
-j when the user asks.denoise: default to -l moderate; omit --audio-only, --high-pass, --pf unless requested.concat-video: preserve input order exactly as given. Hard cut only (no transitions).cutout-video: default to --feather 0 --expand 0; omit -f unless the user requests a container.whisper_medium or whisper_turbo.Users interact through chat platforms (Discord, Slack, web) with no visible terminal. Send a message to the user at each key transition — when you start downloading, when you start processing, and when you finish. The user should never be left wondering what is happening.
Only send messages that match the templates defined below. Everything else is internal reasoning and MUST NOT be sent to the user.
| Type of leak | Bad example | Why it's wrong |
|---|---|---|
| Narrating actions | "Let me search for the file..." | User doesn't need to know how you find files |
| Explaining decisions | "User didn't specify params, need to show options." | Internal logic, not a user message |
| Referencing the skill | "According to the skill..." / "Per skill rules..." | Never mention the skill to the user |
| Thinking out loud | "Now I will run compress-video..." | Just do it, don't announce internals |
| Status tracking | "Options sent, waiting for reply..." | The user already sees the options |
| Echoing raw metadata | "Video info: H.264, 720x968, 6.9 MB. ..." as a standalone message | Metadata belongs inside the options, not as a separate message |
As soon as you receive the user's request, send a message describing the first action you are about to take. If the input file needs to be downloaded from a URL/attachment, say so. If it already exists locally, go straight to the action.
File needs downloading:
⬇️ Downloading clip.mp4 — one moment.
After the download finishes, send the processing message:
🎬 Download complete. Starting compress-video (720p) — this may take a moment.
File already exists locally (cached from a previous request):
Skip the download message and go straight to:
🎬 Starting compress-video on clip.mp4 (720p) — this may take a moment.
If the CLI needs to be installed first:
🔧 VN Tools CLI needs to be installed first (one-time setup) — this may take a minute or two.
After installation, send the processing message as above.
**auto-captions with a non-tiny engine:**
Only ggml-tiny.bin is bundled. Other models download on first use. Check
whether the model file already exists under the CLI's models/ directory.
| Engine | Model file | Download size |
|---|---|---|
whisper_base | models/ggml-base.bin | ≈148 MB |
whisper_medium_en | models/ggml-medium.en-q8_0.bin | ≈785 MB |
whisper_medium | models/ggml-medium-q8_0.bin | ≈785 MB |
whisper_turbo | models/ggml-large-v3-turbo-q8_0.bin | ≈833 MB |
Model does not exist — mention the download in the processing message:
🎬 Starting auto-captions on meeting.mp4 (whisper_base). ⚠️ The whisper_base model (≈148 MB) will be downloaded on first use — one-time only, may add a few extra minutes.
Model already exists:
🎬 Starting auto-captions on meeting.mp4 (whisper_base) — this may take a moment.
One message that combines the result summary and the file delivery.
Success:
✅ Done — audio extracted. Source: interview.mp4 · MP4 · 1920×1080 · 5 min 32 s · 120 MB Output: interview_audio_20260408.mp3 · MP3 · 128 kbps · 5 min 32 s · 3.2 MB
Failure:
❌ Processing failed — the video file appears to be corrupted. Error:
Invalid data found when processing input
Applies when the output file exceeds the platform's upload limit. Image and audio outputs are typically small — send directly. The preview flow below applies only to video.
Step 1 — Estimate preview parameters in one pass to avoid compressing twice. Target ≈80% of the platform limit for headroom:
target_size_kb = upload_limit_mb × 0.8 × 1024
target_bitrate_kbps = target_size_kb × 8 / duration_seconds
| target_bitrate_kbps | Resolution (-r) |
|---|---|
| ≥ 1500 | 480 |
| 800 – 1499 | 360 |
| 400 – 799 | 240 |
| < 400 | 240 (cap at 400) |
Hard floor: never below 240p. If even 240p can't fit, skip the preview.
Step 2 — Send the preview with the summary in one message:
✅ Done — video captioned. Source: meeting.mp4 · MP4 · 1920×1080 · 5 min 32 s · 120 MB Output: meeting_captioned_20260409.mp4 · MP4 · 1920×1080 · 5 min 32 s · 85 MB
🎥 The full-quality file is too large to send here (85 MB, limit 25 MB). Attached is a preview (360p, 4.2 MB). 💾 Full-quality version saved at:
C:\Users\you\Videos\meeting_captioned_20260409.mp4
If even 240p can't fit, skip the preview:
✅ Done — video captioned. Output is 850 MB — too large to preview. 💾 Full-quality file:
C:\Users\you\Videos\meeting_captioned_20260409.mp4
Upload failure (file was within limit): Retry once. If retry also fails:
💾 Upload failed after retry. File saved at:
C:\Users\you\Videos\output_20260409.mp4
If the install script exits with a non-zero code or the CLI is still not found after running it:
%TEMP%\vn-tools-cli-install\vn-tools-cli-install-msi.log for details.msiexec or quarantine the MSI. Ask the user to temporarily disable it and retry.❌ Automatic installation failed. Please download and install manually: https://github.com/cawcut/skill-vn/releases/download/0.1.0/vn-tools-cli_0.1.0.0_windows_x64.msi
If auto-captions fails because the Whisper model could not be downloaded:
User explicitly specified an engine (e.g. whisper_medium): do NOT silently switch. Ask:
The
whisper_mediummodel failed to download. Would you like to: A. Retry downloadingwhisper_mediumB. Switch towhisper_tiny(built-in, no download needed, lower accuracy) C. Cancel
User did NOT specify an engine (using default): automatically fall back to whisper_tiny and inform:
The Whisper model failed to download. Switched to the built-in
whisper_tinyengine — lower accuracy but works without downloading anything.
If the user requests add-caption without providing an SRT file:
Please provide the path to your SRT subtitle file.
Validate the path exists before running the command.
The CLI enforces these constraints — validate before running:
| Parameter | Constraint |
|---|---|
compress-video -r | minimum 144 |
compress-video --fps | 1–120 |
compress-video -b | minimum 100 kbps |
compress-image -q | 0.0–1.0 |
denoise -l custom | requires -v <int> |
cutout-video --feather | 0–100 |
cutout-video --expand | -20 to 20 |
If a value is out of range, tell the user the valid range and ask for a corrected value.
Summarise the error output in plain language. Suggest a next step based on the error content. If unresolvable, refer the user to § 6 Support.
& ${vn-tools-cli} extract-audio "C:\Videos\clip.mp4" -o "C:\Output" -f mp3
& ${vn-tools-cli} extract-frame "C:\Videos\clip.mp4" -p custom -t 5.5 -o "C:\Output"
& ${vn-tools-cli} compress-image "C:\Photos\big.jpg" -q 0.8 -o "C:\Output"
& ${vn-tools-cli} compress-video "C:\Videos\raw.mp4" -r 720 -o "C:\Output"
& ${vn-tools-cli} add-caption "C:\Videos\clip.mp4" --srt "C:\Subs\clip.srt" --font-size 28 -o "C:\Output"
& ${vn-tools-cli} concat-video "C:\Videos\part1.mp4" "C:\Videos\part2.mp4" -o "C:\Output"
& ${vn-tools-cli} auto-captions "C:\Videos\interview.mp4" -e whisper_base -l en -o "C:\Output"
& ${vn-tools-cli} denoise "C:\Audio\noisy.wav" -l high --pf -o "C:\Output"
& ${vn-tools-cli} cutout-video "C:\Videos\person.mp4" --feather 5 -f mp4 -o "C:\Output"
For the full CLI options, all flags, and example invocations:
→ references/cli-reference.md
If you still need help after trying the suggested fixes, contact VN Tools support at vn.support+windows@ui.com.
auto-captions removes the temporary SRT after burning; it does not return the SRT separately.cutout-video composites the subject over a green background (opaque YUV420P); no alpha channel is produced regardless of container format.concat-video is hard cut only — no transition effects.