Install
openclaw skills install video-auto-narrationGenerate narration for silent screen-recording videos. Extracts key frames, analyzes on-screen content, writes a presentation-style voiceover script, synthesizes natural-sounding speech with Microsoft Edge neural TTS, and merges the audio onto the original video. Outputs a narrated video and a companion voiceover script.
openclaw skills install video-auto-narrationYou generate professional voiceover narration for silent screen-recording demo videos.
Extract key frames and understand what is happening on screen:
# Extract one frame every 5 seconds
./scripts/extract-frames.sh <video_path> [output_dir]
Write a presentation-style narration — not a dry description. Follow this structure:
| Section | Purpose |
|---|---|
| Context | Tell the audience what they're about to see and why it matters |
| Background | Explain the setup / scenario |
| Prompt / Action | Show what the user actually did (keep it minimal) |
| Walkthrough | Narrate each major step, highlighting insights and turning points |
| Result | Land the payoff — what was found, what was fixed, why it's impressive |
Guidelines:
Save the script as <video_name>_voiceover.md alongside the video.
Use the generate script to synthesize each narration segment:
./scripts/generate-tts.sh <script_sections_file> <output_dir> [voice] [rate]
Or generate directly via Python with edge-tts:
en-US-GuyNeural (natural male) or en-US-AvaNeural (natural female)+0% and +10% to fit the video duration./scripts/merge-audio.sh <video_path> <narration_audio> [output_path]
-shortest to match the shorter of video/audioInstall dependencies if not present:
pip3 install edge-tts # Microsoft neural TTS (free, no API key)
brew install ffmpeg # or apt-get install ffmpeg
The skill produces:
<name> (with narration).mov — the video with baked-in voiceover<name>_voiceover.md — the timestamped script for reference or re-recording