Install
openclaw skills install youtube-transcribeTranscribe YouTube videos with smart fallback: extracts captions first (fast, free), falls back to local Whisper transcription when no captions available. Au...
openclaw skills install youtube-transcribeSmart YouTube video transcription with automatic fallback:
Use this skill when the user wants to:
Required:
yt-dlp — for caption extraction and audio downloadpython3For Whisper fallback (when no captions available):
ffmpeg — for audio processingmlx-whisper — Apple Silicon native, fastest on Mac (pip install mlx-whisper)faster-whisper — CTranslate2 backend, fast on CUDA/CPU (pip install faster-whisper)openai-whisper — Original Whisper, universal fallback (pip install openai-whisper)python3 {baseDir}/scripts/transcribe.py "https://www.youtube.com/watch?v=VIDEO_ID"
python3 {baseDir}/scripts/transcribe.py "URL" --language zh
python3 {baseDir}/scripts/transcribe.py "URL" --force-whisper
python3 {baseDir}/scripts/transcribe.py "URL" --format json
python3 {baseDir}/scripts/transcribe.py "URL" --output transcript.txt
| Flag | Default | Description |
|---|---|---|
--language | auto | Preferred subtitle/transcription language (e.g. zh, en, ja) |
--format | text | Output format: text, json, srt, vtt |
--output | stdout | Save transcript to file |
--force-whisper | false | Skip caption extraction, go straight to Whisper |
--backend | auto | Whisper backend: auto, mlx, faster-whisper, whisper |
--model | auto | Whisper model size: auto, large-v3, medium, small, base, tiny |
| Variable | Description |
|---|---|
YT_WHISPER_BACKEND | Override Whisper backend selection |
YT_WHISPER_MODEL | Override Whisper model size |
import mlx_whisper. Best for Apple Silicon.import faster_whisper. Best for CUDA GPU, good on CPU.import whisper. Universal fallback.| RAM | Model | VRAM/RAM Usage |
|---|---|---|
| ≥16GB | large-v3 | ~6-10GB |
| ≥8GB | medium | ~5GB |
| ≥4GB | small | ~2.5GB |
| <4GB | base | ~1.5GB |
When --language is not specified, captions are searched in this order:
zh-Hant, zh-Hans, zh-TW, zh-CN, zhenPlain text transcript, one continuous block.
{
"video_id": "ZSnYlbIYpjs",
"title": "Video Title",
"channel": "Channel Name",
"duration": 708,
"language": "zh",
"method": "captions",
"transcript": [
{"start": 0.0, "end": 5.2, "text": "..."},
...
],
"full_text": "Complete transcript as single string"
}
Standard subtitle formats with timestamps.
brew install yt-dlpbrew install ffmpeg