transcribe-video
v1.0.0Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles,...
Like a lobster shell, security has layers — review code before you run it.
Transcribe Video
Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.
Step 1: Identify the video file
Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.
Step 2: Check for embedded subtitles
ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"
- If subtitle streams exist → go to Step 3a (extract embedded subtitles)
- If no subtitle streams → go to Step 3b (API transcription)
Step 3a: Extract embedded subtitles
If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.
# Extract as SRT (stream index 0 for first subtitle track; adjust if needed)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y
After extraction, convert SRT to clean text:
- Remove sequence numbers
- Remove timestamp lines (lines matching
\d{2}:\d{2}:\d{2}) - Remove HTML-like tags (
<i>,</i>, etc.) - Join remaining non-empty lines
Save the clean transcript to <video_name>.txt next to the video file. Done — skip Step 3b.
Step 3b: API-based transcription
Use the bundled transcription script. It reads credentials from ~/.transcribe_video.env.
Prerequisites check
-
Verify the env file exists:
test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING" -
If MISSING, tell the user to create
~/.transcribe_video.envwith:OPENAI_API_KEY=your-key-here # Optional Base URL: # OPENAI_API_BASE=https://<base-url>/v1/ # Optional Model Name: # TRANSCRIBE_MODEL=gpt-4o-transcribeWait for the user to confirm before proceeding.
-
Verify dependencies:
python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1If missing:
pip install openai python-dotenv
Run transcription
python3 <skill_directory>/scripts/transcribe.py "<video_path>"
The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to <video_name>.txt next to the video file.
Step 4: Report results
Tell the user:
- Where the transcript file was saved
- How many lines / approximate word count
- Whether it came from embedded subtitles or API transcription
- Display the first few lines as a preview
Comments
Loading comments...
