transcribe-video

v1.0.0

Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles,...

⭐ 0· 40·0 current·0 all-time

byPengfei Ni@feiskyer

Security Scan

Capability signals

Requires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

The skill claims to transcribe local video files which aligns with the included script and instructions. However, the registry metadata declares no required binaries or env vars while the SKILL.md and scripts require ffmpeg/ffprobe and an OpenAI API key file (~/.transcribe_video.env). The absence of these declared requirements in the manifest is an incoherence: a legitimate transcribe skill should list ffmpeg/ffprobe and the API key as required.

ℹ

Instruction Scope

The SKILL.md stays largely within the stated purpose: it checks for embedded subtitles, extracts audio, and calls an external transcription API. It instructs the agent to read ~/.transcribe_video.env for credentials (explicitly stated) and to run ffprobe/ffmpeg and a local Python script. There is no unrelated file collection, but the instructions do send audio off-host to the OpenAI API, which is expected for API-based transcription but is a privacy consideration the user should be aware of.

✓

Install Mechanism

This is an instruction-only skill with no install spec; the included Python script and instructions rely on user-installed tools and Python packages. No downloads from arbitrary URLs or hidden installers are present.

Credentials

The code expects credentials (OPENAI_API_KEY and optional OPENAI_API_BASE / TRANSCRIBE_MODEL / AZURE_API_VERSION) loaded from ~/.transcribe_video.env, but the skill metadata declares no required env or primary credential. Requesting an API key (a sensitive secret) is proportionate to doing cloud transcription, but the manifest should explicitly declare it. The skill will transmit audio to an external service (OpenAI or Azure OpenAI) — this is necessary for API transcription but important to surface to users.

✓

Persistence & Privilege

The skill does not request persistent or global privileges (always:false). It does not modify other skills or system-wide configuration. Its runtime behavior is limited to reading a dedicated env file in the user's home, invoking ffmpeg, and calling an external API.

What to consider before installing

This skill appears to do what it says (extract embedded subtitles or send audio to OpenAI for transcription), but the package metadata omits two important requirements: (1) you must have ffmpeg/ffprobe installed on the host, and (2) you must create ~/.transcribe_video.env containing OPENAI_API_KEY (and optional OPENAI_API_BASE/TRANSCRIBE_MODEL). Before installing/using: (a) review the included scripts (scripts/transcribe.py) yourself and ensure you trust the code and the unknown publisher, (b) be aware that audio will be uploaded to an external API (privacy risk), (c) if you don’t want network transcription, use only embedded subtitles or adopt an offline STT tool, and (d) ask the publisher to update the manifest to declare required binaries and the primary credential so the skill's metadata accurately reflects its needs. If you proceed, run it in a controlled environment (or without network access) until you are comfortable.

Like a lobster shell, security has layers — review code before you run it.

latestvk972nh4ng9zaxg9bfcf9cevrbn84xrm7

40downloads

0stars

1versions

Updated 4d ago

v1.0.0

MIT-0

Transcribe Video

Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.

Step 1: Identify the video file

Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.

Step 2: Check for embedded subtitles

ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"

If subtitle streams exist → go to Step 3a (extract embedded subtitles)
If no subtitle streams → go to Step 3b (API transcription)

Step 3a: Extract embedded subtitles

If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.

# Extract as SRT (stream index 0 for first subtitle track; adjust if needed)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y

After extraction, convert SRT to clean text:

Remove sequence numbers
Remove timestamp lines (lines matching \d{2}:\d{2}:\d{2})
Remove HTML-like tags (<i>, </i>, etc.)
Join remaining non-empty lines

Save the clean transcript to <video_name>.txt next to the video file. Done — skip Step 3b.

Step 3b: API-based transcription

Use the bundled transcription script. It reads credentials from ~/.transcribe_video.env.

Prerequisites check

Verify the env file exists:

test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING"

If MISSING, tell the user to create ~/.transcribe_video.env with:

OPENAI_API_KEY=your-key-here
# Optional Base URL:
# OPENAI_API_BASE=https://<base-url>/v1/
# Optional Model Name:
# TRANSCRIBE_MODEL=gpt-4o-transcribe

Wait for the user to confirm before proceeding.

Verify dependencies:

python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1

If missing: pip install openai python-dotenv

Run transcription

python3 <skill_directory>/scripts/transcribe.py "<video_path>"

The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to <video_name>.txt next to the video file.