Install
openclaw skills install video-learning-notesUse this skill when the user provides a video URL and wants a complete Markdown learning note. It downloads the original video, transcribes audio with qwen-audio/STT, extracts timestamped frames with ffmpeg, reads and filters key screenshots one by one in combination with subtitles, and finally generates an illustrated learning note.
openclaw skills install video-learning-notesConvert a video URL or local video file into a complete Markdown learning note. The note should be structured from the STT subtitle content and include selected key screenshots. Use this skill for requests such as “turn this video into learning notes”, “download this video and transcribe/analyze it”, or similar video-to-learning-note tasks.
Create a self-contained output directory containing:
transcript.srt generated by qwen-audio/STT.frames/.selected_frames/.video_learning_notes.md, using relative paths for the source video and images, and citing screenshot timestamps.Create a dedicated output directory for each video note. Prefer the current task directory or a stable path such as ./<note-title>/. Keep all generated files inside this directory; do not scatter outputs into shared default folders.
If the source is an online video, use the yt-dlp-downloader skill/workflow to download the user-provided video URL. Preserve the original or best available quality when possible, and write the video into the current workspace.
Check dependencies when needed before downloading:
which yt-dlp || echo "yt-dlp not installed. Install with: pip install yt-dlp"
which ffmpeg || echo "ffmpeg not installed. Install with: brew install ffmpeg"
Recommended commands:
# Generic: download best quality into the workspace
yt-dlp -P "/path/to/workspace" -o "%(title)s.%(ext)s" "VIDEO_URL"
# YouTube: use browser cookies by default to reduce 403 errors
yt-dlp -P "/path/to/workspace" --cookies-from-browser chrome -o "%(title)s.%(ext)s" "YOUTUBE_URL"
# Download subtitles when available; still run qwen-audio/STT unless the user only wants official subtitles
yt-dlp -P "/path/to/workspace" --write-subs --sub-langs all -o "%(title)s.%(ext)s" "VIDEO_URL"
Platform handling principles:
--cookies-from-browser chrome by default. Supported browser cookie sources include chrome, firefox, safari, edge, brave, and opera.bestvideo[height<=1080]+bestaudio/best[height<=1080].After downloading, identify the actual video file path, such as .mp4, .mkv, .mov, .webm, etc. If multiple files are produced, choose the main video as the source for the learning note, while keeping subtitles, thumbnails, and other files as supporting assets.
Troubleshooting:
HTTP 403 Forbidden: retry with --cookies-from-browser chrome or another browser where the user is logged in.Video unavailable, private videos, or geo-restricted videos: ask the user for login access, cookies, or an accessible environment; do not bypass access restrictions.Format not available: run yt-dlp -F "VIDEO_URL" to list available formats, then choose one.yt-dlp: command not found: install yt-dlp or ask the user to install it.If yt-dlp-downloader / yt-dlp is unavailable, or if the video requires login/authentication, stop and ask the user to provide the missing access requirement instead of silently switching to unreliable tools.
Run qwen-audio/STT on the downloaded video or extracted audio, and save the result as transcript.srt.
For large videos, first use ffmpeg to extract compressed mono audio, then transcribe the smaller audio file:
ffmpeg -y -i input.mp4 -vn -ac 1 -ar 16000 -b:a 32k audio_for_stt.mp3
Preserve timestamp information as much as possible. Prefer SRT format. If STT only produces plain text, create transcript.txt and clearly note in the final output that exact subtitle timing is unavailable.
After confirming the video path, use scripts/prepare_video_learning_assets.py. The script generates timestamped candidate screenshots and a manifest file:
python3 "$SKILL_DIR/scripts/prepare_video_learning_assets.py" \
--video /path/to/video.mp4 \
--out /path/to/workspace \
--scene-threshold 0.3
By default, the script extracts frames only from ffmpeg scene changes; it does not take one screenshot every 30 seconds. Use --interval <seconds> only when regular interval screenshots are explicitly needed.
For most learning videos, the recommended --scene-threshold range is 0.1–0.3:
frames_manifest.json and adjust the threshold so the number of candidate frames is suitable for manual review.The script writes:
frames/frame_000001__HH-MM-SS.jpgframes_manifest.jsonvideo_learning_notes.skeleton.mdIf scene detection misses important content, add --interval <seconds> as a supplement. Use 10–15 seconds for slide-heavy or fast-changing instructional videos, and 45–60 seconds for talking-head videos.
Use the Read tool's visual analysis capability to inspect extracted screenshots. You must check candidate frames one by one in chronological order, and decide whether each frame is a key frame by combining the image content with nearby STT/SRT text.
For each candidate image:
transcript.srt, usually the preceding and following 15–30 seconds.Prioritize frames containing:
Skip frames that are:
Copy selected key images into selected_frames/, preserving the original timestamped filenames. Keep only enough screenshots to support learning; do not keep every candidate frame. For most videos, 8–30 screenshots are enough. Use more only when the video is highly visual.
Read transcript.srt, the selected screenshot filenames/timestamps, and the visual notes produced while reading images one by one. Create video_learning_notes.md only after the key-frame selection step is complete.
When writing, embed key screenshots into the corresponding time-based sections, and explain why each screenshot matters based on both the STT context and the image content. Use the following structure:
# <Video title or topic>
<video src="relative/path/to/video.mp4" controls ></video>
- Original video: <URL>
- Video file: [<video-title>](relative/path/to/video.mp4)
- Generated date: YYYY-MM-DD
## Core Summary
<Summarize the most important ideas in 5–10 bullet points.>
## Learning Objectives
<List the concepts or operations the learner should understand after watching.>
## Section-by-Section Notes
### 00:00:00–00:03:20 <Section title>
<Convert the subtitles into readable learning notes. Do not dump the raw transcript.>

## Key Concepts / Terms
| Term | Explanation | Timestamp |
|---|---|---|
## Steps or Methodology
<If the video is a tutorial, organize the steps. If it is a course, organize the conceptual framework.>
## Review Checklist
- [ ] <Question or checkpoint>
<video src="relative/path/to/video.mp4" controls ></video>.yt-dlp-downloader, ffmpeg, ffprobe, qwen-audio/STT commands, and necessary file operations.files_preview when appropriate to show the final Markdown and output directory.