Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Video Insight

v1.0.0

Cross-platform video transcript extraction and optional AI summarization for YouTube and Bilibili. GPU auto-detect. Transcript-first with opt-in LLM summary.

0· 40·0 current·0 all-time
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Code and setup align with the stated purpose: yt-dlp, ffmpeg, faster-whisper are used for download, audio extraction, and transcription, and the CLI wraps YouTube/Bilibili handling. However, the code also invokes yt-dlp's --cookies-from-browser fallback (reads browser cookies via yt-dlp) and supports sending full transcripts to arbitrary LLM endpoints — behaviours that are not called out in the top-level metadata and may be surprising to users.
!
Instruction Scope
SKILL.md says default is transcript-only, but the runtime instructions & code: (1) permanently cache full transcripts to ~/.cache/video-insight, (2) may extract keyframes, (3) on download failure try yt-dlp --cookies-from-browser (reads local browser cookies), and (4) when --summarize is used will post the entire transcript (no truncation) to an external LLM API if LLM envs are set. These are scope-expanding actions (reading browser cookies, persistent local storage, network exfiltration of large transcripts) that are not declared as required in the metadata.
Install Mechanism
No install spec in registry (instruction-only), but a provided setup.sh installs Python deps (yt-dlp, youtube-transcript-api, innertube, requests, faster-whisper) into a venv or system Python. This is a normal approach for such a tool; install uses PyPI (standard). No remote arbitrary binary downloads or obscure URLs in the installer.
!
Credentials
Registry declares no required env vars, but the code reads many env vars: WHISPER_DEVICE, WHISPER_MODEL, FRAME_TIME_OFFSET, FRAME_INTERVAL, MAX_FRAMES, LLM_API_URL, LLM_API_KEY, LLM_MODEL, OPENCLAW_GATEWAY_TOKEN, etc. In particular, if LLM_API_URL + LLM_API_KEY (or OPENCLAW_GATEWAY_TOKEN) are set, the skill will send full transcripts to that external endpoint — a high-sensitivity action that should be declared and explicitly consented to. Cache storage of transcripts is permanent by default (also not called out as a required configuration item).
Persistence & Privilege
always:false and the skill does not modify other skills. However it writes permanent cached transcript files to ~/.cache/video-insight and creates a venv and cache dir during setup.sh. Temp files are managed and cleaned, but caches are intentionally permanent unless manually removed.
What to consider before installing
What to consider before installing: - Privacy: transcripts are cached permanently under ~/.cache/video-insight. If transcripts may contain sensitive content, clear or relocate the cache and inspect Cache.put/get behavior. - External LLMs: the summarization feature is opt-in, but if you set LLM_API_URL + LLM_API_KEY (or OPENCLAW_GATEWAY_TOKEN), the tool will POST the entire transcript to that endpoint (no truncation). Only set those env vars for endpoints you trust and that have an appropriate privacy policy. - Browser cookies: on download failure the tool retries with yt-dlp --cookies-from-browser chrome. That causes yt-dlp to access your browser cookie store (potentially exposing authenticated content). If you don't want local browser cookies accessed, avoid allowing that fallback or run in an isolated environment. - Installation: setup.sh installs Python packages from PyPI and may modify your Python environment or create a venv. Review the script before running; consider installing in an isolated virtual environment or container. - If you need higher assurance: request the publisher/source, verify package signatures or hashes, or run the tool in an isolated VM/container. If you want the skill but not remote summaries, do not set LLM_API_* or OPENCLAW_GATEWAY_TOKEN and avoid --summarize; manually review cached files and remove them if undesired. Additional info that would change this assessment: an authoritative source/homepage or explicit metadata declaring the env vars the skill expects (and the privacy implications), or removal/documentation of the yt-dlp cookie fallback. With those clarifications this would likely be classified as benign (coherent) rather than suspicious.

Like a lobster shell, security has layers — review code before you run it.

latestvk9799pcc3kaeeb85nw734qqmpd84wz4c
40downloads
0stars
1versions
Updated 4d ago
v1.0.0
MIT-0

video-insight

Cross-platform video transcript extraction and optional AI summarization for YouTube and Bilibili.

Description

Extract transcripts, metadata, and optional keyframes from YouTube and Bilibili videos. Outputs structured JSON to stdout. By default, no LLM summarization is performed — the agent receives the full transcript and does its own summarization with full context window.

Supports: macOS, Linux, WSL, Windows VM.

Usage

# Single video — transcript only (default, recommended)
video-insight --url "https://www.youtube.com/watch?v=VIDEO_ID"

# Bilibili video
video-insight --url "https://www.bilibili.com/video/BV1xxxxx"

# With LLM summary (opt-in)
video-insight --url "https://..." --summarize

# Channel scan (recent videos)
video-insight --channel "UC_x5XG1OV2P6uZZ5FSM9Ttw" --hours 24

# Quiet mode (no stderr progress)
video-insight --url "https://..." --quiet

# Force refresh (ignore cache)
video-insight --url "https://..." --no-cache

# Extract keyframes too
video-insight --url "https://..." --frames

Triggers

Summarize video, extract transcript, YouTube summary, Bilibili transcript, video transcript, 视频摘要, 视频总结, B站视频, YouTube视频

Output Schema

{
  "ok": true,
  "data": {
    "video_id": "dQw4w9WgXcQ",
    "platform": "youtube",
    "title": "Video Title",
    "channel": "Channel Name",
    "duration_seconds": 212,
    "transcript": "Full transcript text without truncation...",
    "transcript_with_timestamps": "[0.0-3.2] First segment\n[3.2-6.5] Second...",
    "frames": [{"file": "/tmp/.../frame_001.jpg", "time_sec": 30}],
    "cached": false
  },
  "error": null
}

Cache

Transcripts are permanently cached at ~/.cache/video-insight/{platform}_{video_id}.json. The .json format stores metadata + transcript together for richer cache hits (title, channel, duration, timestamps). Use --no-cache to force re-fetch.

Agent Integration Guide

  1. Default workflow: Call video-insight --url <URL>, receive JSON with full transcript. Use your own LLM context to summarize — you have 128K+ tokens, no need for the script to truncate.

  2. For keyframes: Add --frames flag. Only needed when the user explicitly asks for a visual/image review.

  3. Long videos (2h+): The transcript may be very large. Use map-reduce or chunked summarization.

  4. Bilibili videos require ffmpeg and faster-whisper (installed via setup.sh). YouTube videos typically have captions and are much faster.

  5. Cron/headless: Use --summarize --quiet for automated pipelines.

Setup

cd ~/.openclaw/skills/video-insight && bash setup.sh

Dependencies

Required: yt-dlp, youtube-transcript-api, innertube, ffmpeg (system) Optional: faster-whisper (for Bilibili/captionless videos), requests (for --summarize)

Comments

Loading comments...