Install
openclaw skills install @gmdeep/video-to-markdownAnalyze any YouTube, Facebook, or Instagram video URL and generate a comprehensive Markdown reference document by combining AI vision analysis of extracted frames with full video transcription. Use this skill when a user shares a video URL and wants a summary, notes, breakdown, or reference document from it. Triggers on "analyze this video", "summarize this video", "break down this video", "create notes from this video", "watch this video and explain it", "video to markdown", "pull notes from this", or any request to extract knowledge from a video. Also triggers automatically when a user pastes a YouTube, Instagram, or Facebook URL and asks what it's about or wants to understand its content — even without explicit keywords. Especially valuable for educational content, trading tutorials, technical demos, or any video where charts, diagrams, and on-screen visuals tell a different story than the narration alone.
openclaw skills install @gmdeep/video-to-markdownExtracts key frames + transcript from a video, sends both to Claude vision, and produces a structured Markdown document that captures everything the video teaches — including what's shown on screen but not fully explained verbally.
python scripts/video_analyzer.py "<URL>" [--output DIR] [--max-frames N] [--cookies FILE] [--whisper]
Output: a .md file in the output directory.
Confirm the video URL from the user. Supported: YouTube, Facebook, Instagram (and most other sites yt-dlp handles).
Run the preflight check:
ffmpeg -version && yt-dlp --version && python3 -c "import anthropic, PIL; print('deps OK')"
If anything fails:
bash scripts/setup.sh
Also confirm ANTHROPIC_API_KEY is set:
echo $ANTHROPIC_API_KEY
If not set:
export ANTHROPIC_API_KEY=your_key_here
Standard (YouTube, captions available):
python scripts/video_analyzer.py "<URL>" --output ./output
Trading / chart-heavy content (more frames, Whisper for accuracy):
python scripts/video_analyzer.py "<URL>" --max-frames 80 --whisper --output ./output
Talking-head / lecture (fewer frames, captions sufficient):
python scripts/video_analyzer.py "<URL>" --max-frames 20 --output ./output
Facebook or Instagram (cookies required):
python scripts/video_analyzer.py "<URL>" --cookies /path/to/cookies.txt --output ./output
Maximum quality (Opus model + large Whisper + more frames):
python scripts/video_analyzer.py "<URL>" \
--model claude-opus-4-20250514 \
--whisper --whisper-model large-v3 \
--max-frames 80 \
--output ./output
The script prints the output file path as its last line. Read it and present the contents to the user:
cat ./output/<filename>.md
| Flag | Default | When to change |
|---|---|---|
--max-frames | 50 | Lower (20–30) for talking-head; higher (60–80) for dense charts |
--whisper | off | Use when no captions exist, or for jargon-heavy content |
--whisper-model | base | large-v3 for highest accuracy (slower, more RAM) |
--cookies | none | Required for Facebook/Instagram; sometimes YouTube |
--model | claude-sonnet-4-20250514 | claude-opus-4-20250514 for complex visual analysis |
--output | current dir | Set to a specific notes folder |
| Video length | Frames | Approx. cost |
|---|---|---|
| 10 min | ~20 | ~$0.08 |
| 30 min | ~50 | ~$0.20 |
| 60 min | ~80 | ~$0.35 |
Use --model claude-haiku-4-5-20251001 for ~5× lower cost when analysis quality is less critical.
See references/platforms.md for full detail on cookie setup for Facebook and Instagram.
Quick summary:
Cookie source: Firefox only (Chrome cookies encrypted since v127). Export from logged-in session on same IP you're running from.
Each run produces a .md file with:
YAML frontmatter:
Document body:
"No frames extracted" → Check ffmpeg is installed and the video downloaded to the temp dir. Try --max-frames 10 on a short public YouTube video first.
"No captions found" (and no Whisper) → Normal for non-captioned videos. Install faster-whisper and add --whisper, or the analysis continues from frames alone.
Facebook "Cannot parse data" → Cookies may be stale or from a different IP. Re-export from Firefox immediately before use, same network.
Instagram fails with cookies → Intermittent. Wait a few minutes and retry. Try a different account if it persists.
Output is too short / missing visuals → Increase --max-frames or upgrade to --model claude-opus-4-20250514.
High cost → Reduce --max-frames to 20–30. Talking-head content rarely needs more than 20 frames.