Youtube Transcription Generator

Use VLM Run (vlmrun) to generate transcriptions from YouTube videos. Download a video with yt-dlp, then run vlmrun to transcribe with optional timestamps. VLMRUN_API_KEY must be in .env; follow vlmrun-cli-skill for CLI setup and options.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 579 · 1 current installs · 1 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The stated purpose (download YouTube with yt-dlp, transcribe with vlmrun) is internally consistent. However the registry metadata lists no required binaries or env vars while the SKILL.md explicitly requires yt-dlp, vlmrun, Python, and VLMRUN_API_KEY. That mismatch between metadata and runtime instructions is unexpected and should be clarified.
!
Instruction Scope
SKILL.md instructs checking .env/.env.local for VLMRUN_API_KEY and running local scripts (e.g., scripts/run_transcription.py), installing requirements.txt, and using yt-dlp and vlmrun. The skill bundle contains no code files, no requirements.txt, and no .env_template — the instructions assume files that are not present. The doc also uses an unexplained 'uv' prefix for venv and pip commands. These gaps mean the assistant could give commands or expect artifacts that don't exist, and the SKILL.md has authority to instruct the agent to read .env (sensitive) and to send video data to an external VLM Run service (expected for transcription, but a privacy/exfiltration consideration).
Install Mechanism
There is no install spec (instruction-only), so nothing is written to disk by a supplied installer. This minimizes direct installer risk. However, the instructions ask the user/agent to install vlmrun[cli] and yt-dlp via pip — that's normal for this workflow but not declared in the registry metadata.
!
Credentials
The SKILL.md requires VLMRUN_API_KEY in .env, but the registry lists no required environment variables or primary credential. Asking to read .env/.env.local is sensitive because these files can contain other secrets; the skill should explicitly declare the env vars it needs. Otherwise the assistant might be instructed to examine or rely on environment config that wasn't disclosed.
Persistence & Privilege
The skill is not forced-always, does not request persistent privileges, and does not claim to modify other skills or agent-wide settings. Autonomous invocation is allowed (platform default) but not combined with other elevated flags.
What to consider before installing
This skill's goal (download a YouTube video with yt-dlp and transcribe it via vlmrun) is plausible, but the package metadata and the included files are inconsistent with the instructions. Before installing or following the steps: 1) Ask the publisher for the source repository or homepage and for the missing files (requirements.txt, scripts/run_transcription.py, .env_template). 2) Confirm what the 'uv' command prefix means — it's not a standard system command and could be a wrapper you don't expect. 3) Be aware that running vlmrun will send your video/audio to an external VLM Run service (privacy/copyright risk). 4) Do not have the assistant read your .env or other configuration files unless you explicitly confirm which variables it may access; VLMRUN_API_KEY should be the only declared secret if needed. 5) If you decide to run any provided scripts, open and inspect them locally (or have a trusted reviewer do so) to confirm they only call yt-dlp and vlmrun and do not exfiltrate other data. Resolving the metadata omissions and providing the missing repository files would increase confidence; until then treat this skill as suspicious.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.0
Download zip
latestvk977qdzxdrgz5az8qb7dbcbd7d80wfwq

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

YouTube Transcription Generator (VLM Run)

Generate transcriptions from YouTube videos using vlmrun for speech-to-text and optional timestamps. This skill:

  1. Downloads the YouTube video (or audio) with yt-dlp.
  2. Transcribes the video with vlmrun (Orion visual AI).
  3. Saves the transcript to a file (plain text or with timestamps).

Refer to vlmrun-cli-skill for vlmrun CLI setup, environment variables, and all vlmrun chat options.


How the assistant should use this skill

  • Check .env for API key

    • Ensure .env (or .env.local) contains VLMRUN_API_KEY.
    • If missing, instruct the user to set it before running any vlmrun commands.
  • Use vlmrun for transcription only

    • For transcription (and optional timestamps), use the vlmrun CLI with a video file as input (-i <video>).
    • vlmrun accepts video files (e.g. .mp4). For YouTube, the skill first downloads the video with yt-dlp, then passes the file to vlmrun.
  • Workflow

    • User provides a YouTube URL (and optionally output path).
    • Download the video (or audio-only for faster/smaller) with yt-dlp.
    • Run: vlmrun chat "Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format." -i <downloaded_file> -o <output_dir>.
    • Capture vlmrun’s response and save it as the transcript file (e.g. transcript.txt).

Prerequisites

  • Python 3.10+
  • VLMRUN_API_KEY (required for vlmrun)
  • vlmrun CLI (vlmrun[cli])
  • yt-dlp (for downloading YouTube videos)

See vlmrun-cli-skill for detailed vlmrun usage and examples (including video transcription).


Installation & Setup

From the youtube-transcription-generator directory:

Windows (PowerShell):

cd path\to\youtube-transcription-generator
uv venv
.venv\Scripts\Activate.ps1
uv pip install -r requirements.txt

macOS/Linux:

cd path/to/youtube-transcription-generator
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Copy .env_template to .env and set VLMRUN_API_KEY.


Quick Start: Transcribe a YouTube Video

Option A: Run the script (recommended)

# From youtube-transcription-generator directory, with venv activated
python scripts/run_transcription.py "https://www.youtube.com/watch?v=VIDEO_ID" -o ./output

This will:

  1. Download the video with yt-dlp into the output directory.
  2. Run vlmrun to transcribe the video.
  3. Save the transcript as output/transcript.txt (and keep artifacts in output/).

Option B: Manual vlmrun (after downloading the video yourself)

# 1) Download with yt-dlp
yt-dlp -f "bv*[ext=mp4]+ba/best[ext=mp4]/best" -o video.mp4 "https://www.youtube.com/watch?v=VIDEO_ID"

# 2) Transcribe with vlmrun (see vlmrun-cli-skill for options)
vlmrun chat "Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format." -i video.mp4 -o ./output

Capture the vlmrun stdout and save it as your transcript, or use --json if you need structured output.


Prompt variants for vlmrun

  • With timestamps:
    "Transcribe this video with timestamps for each section. Output the full transcript in a clear, readable format."

  • Plain transcript only:
    "Transcribe everything said in this video. Output only the spoken text, no timestamps."

  • Structured (e.g. JSON):
    Use --json and ask for a structured format in the prompt (e.g. list of { "time": "...", "text": "..." }).


Workflow checklist

  • Confirm vlmrun is installed and VLMRUN_API_KEY is set (see vlmrun-cli-skill).
  • Install dependencies: uv pip install -r requirements.txt (includes vlmrun[cli] and yt-dlp).
  • Run python scripts/run_transcription.py <youtube_url> -o ./output or download + vlmrun manually.
  • Find transcript in the output directory (e.g. output/transcript.txt).

Troubleshooting

  • vlmrun not found
    Activate the venv and run: uv pip install "vlmrun[cli]". See vlmrun-cli-skill.

  • Authentication errors
    Verify VLMRUN_API_KEY in .env or the current shell.

  • yt-dlp fails
    Update yt-dlp: uv pip install -U yt-dlp. Check the URL is a valid public YouTube video.

  • Large or long videos
    Use audio-only download in the script (e.g. -f bestaudio) to reduce size and speed up transcription.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…