YouTube Transcribe

v1.0.0

Transcribe YouTube videos with smart fallback: extracts captions first (fast, free), falls back to local Whisper transcription when no captions available. Au...

0· 310·3 current·3 all-time
byLogan@iml885203

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for iml885203/youtube-transcribe.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "YouTube Transcribe" (iml885203/youtube-transcribe) from ClawHub.
Skill page: https://clawhub.ai/iml885203/youtube-transcribe
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3, yt-dlp
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install iml885203/youtube-transcribe

ClawHub CLI

Package manager switcher

npx clawhub@latest install youtube-transcribe
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the implementation: the code uses yt-dlp to extract captions and to download audio, then attempts local Whisper backends (mlx, faster-whisper, openai-whisper). Declared required binaries (python3, yt-dlp) and optional ffmpeg are appropriate and sufficient for the stated functionality.
Instruction Scope
SKILL.md instructs running the included Python script with YouTube URLs and documents env var overrides. The script only reads system memory info (sysctl or /proc/meminfo), uses temporary directories, invokes yt-dlp/ffmpeg subprocesses, parses VTT files, and imports local Whisper backends. It does not reference unrelated files, external endpoints beyond YouTube/yt-dlp, or request arbitrary environment secrets.
Install Mechanism
Install specs are brew formulas for yt-dlp and ffmpeg (standard package sources). The skill does not auto-download code from an arbitrary URL or include high-risk install steps; Whisper backends are mentioned as pip installs but not automatically fetched by the skill itself.
Credentials
No credentials or sensitive environment variables are required. Two optional env vars (YT_WHISPER_BACKEND, YT_WHISPER_MODEL) are reasonable and documented. The script reads only common system info (RAM) to pick model sizes.
Persistence & Privilege
The skill is not always-enabled, does not modify other skills or global agent settings, and does not request persistent elevated privileges. It runs as a user-level script and uses temporary files.
Assessment
This skill appears internally consistent and implements the advertised behavior. Before installing, consider: 1) it will download YouTube metadata and audio to a temporary directory (yt-dlp/ffmpeg are used), so only run it with links you trust and where downloading is permitted; 2) Whisper backends and models (if used) can be large and may need GPU/CPU resources—you must pip-install the backend you want and may incur large model downloads; 3) the script runs yt-dlp subprocesses (no shell injection vulnerability visible because arguments are passed as lists), but you should still review the included script if you have a high security posture; 4) there are no requested secrets or outbound endpoints beyond standard YouTube/yt-dlp usage. If any of the above is unacceptable (downloading media, installing models, or running subprocesses), do not install the skill.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎬 Clawdis
Binspython3, yt-dlp

Install

Install yt-dlp (brew)
Bins: yt-dlp
brew install yt-dlp
Install ffmpeg (brew)
Bins: ffmpeg
brew install ffmpeg
latestvk97a5fq2d8xdrp1777f2tmk4j982gbcd
310downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

YouTube Transcribe

Smart YouTube video transcription with automatic fallback:

  1. Captions first — extracts existing subtitles (manual or auto-generated) via yt-dlp. Fast, free, no compute.
  2. Whisper fallback — when no captions exist, downloads audio and transcribes locally with the best available Whisper backend.

When to Use

Use this skill when the user wants to:

  • Get a transcript or text version of a YouTube video
  • Understand what a YouTube video says without watching it
  • Summarize, analyze, or take notes from a YouTube video
  • Extract subtitles or captions from a video

Triggers

  • "transcribe this YouTube video"
  • "what does this video say"
  • "get the transcript of [YouTube URL]"
  • "summarize this YouTube video" (transcribe first, then process)
  • Any YouTube URL shared with a request to understand its content

Requirements

Required:

  • yt-dlp — for caption extraction and audio download
  • python3

For Whisper fallback (when no captions available):

  • ffmpeg — for audio processing
  • One of these Whisper backends (auto-detected in priority order):
    1. mlx-whisper — Apple Silicon native, fastest on Mac (pip install mlx-whisper)
    2. faster-whisper — CTranslate2 backend, fast on CUDA/CPU (pip install faster-whisper)
    3. openai-whisper — Original Whisper, universal fallback (pip install openai-whisper)

Usage

Basic — transcribe a video

python3 {baseDir}/scripts/transcribe.py "https://www.youtube.com/watch?v=VIDEO_ID"

Specify language for captions

python3 {baseDir}/scripts/transcribe.py "URL" --language zh

Force Whisper (skip caption check)

python3 {baseDir}/scripts/transcribe.py "URL" --force-whisper

JSON output

python3 {baseDir}/scripts/transcribe.py "URL" --format json

Save to file

python3 {baseDir}/scripts/transcribe.py "URL" --output transcript.txt

Options

FlagDefaultDescription
--languageautoPreferred subtitle/transcription language (e.g. zh, en, ja)
--formattextOutput format: text, json, srt, vtt
--outputstdoutSave transcript to file
--force-whisperfalseSkip caption extraction, go straight to Whisper
--backendautoWhisper backend: auto, mlx, faster-whisper, whisper
--modelautoWhisper model size: auto, large-v3, medium, small, base, tiny

Environment Variables

VariableDescription
YT_WHISPER_BACKENDOverride Whisper backend selection
YT_WHISPER_MODELOverride Whisper model size

Auto-Detection

Whisper Backend (priority order)

  1. MLX Whisper — detected via import mlx_whisper. Best for Apple Silicon.
  2. faster-whisper — detected via import faster_whisper. Best for CUDA GPU, good on CPU.
  3. OpenAI Whisper — detected via import whisper. Universal fallback.

Model Size (based on available RAM)

RAMModelVRAM/RAM Usage
≥16GBlarge-v3~6-10GB
≥8GBmedium~5GB
≥4GBsmall~2.5GB
<4GBbase~1.5GB

Caption Language Priority

When --language is not specified, captions are searched in this order:

  1. Video's original language
  2. Chinese variants: zh-Hant, zh-Hans, zh-TW, zh-CN, zh
  3. English: en
  4. Any available language

Output Formats

text (default)

Plain text transcript, one continuous block.

json

{
  "video_id": "ZSnYlbIYpjs",
  "title": "Video Title",
  "channel": "Channel Name",
  "duration": 708,
  "language": "zh",
  "method": "captions",
  "transcript": [
    {"start": 0.0, "end": 5.2, "text": "..."},
    ...
  ],
  "full_text": "Complete transcript as single string"
}

srt / vtt

Standard subtitle formats with timestamps.

Comments

Loading comments...