Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Video Reader
v4.1.1Tool-driven video question answering with frame extraction, sub-agent analysis, and audio transcription
⭐ 1· 50·0 current·0 all-time
byQianke Meng@qiankemeng
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description (video question answering with frame extraction and transcription) matches the code and runtime instructions: tools for download, metadata, frame extraction, and audio transcription are present. However the skill metadata declares no required binaries or environment variables while the code clearly expects external binaries (ffmpeg, optional yt-dlp) and reads multiple environment variables (WHISPER_API_KEY, WHISPER_BASE_URL, WHISPER_MODEL, VISION_API_KEY/OPENAI_API_KEY, ANTHROPIC_API_KEY). That mismatch between declared requirements and actual dependencies is unexpected and should be resolved before trusting the skill.
Instruction Scope
SKILL.md confines runtime actions to video download/inspect/extract/transcribe and spawning sub-agents to analyze image grids; it instructs the agent to use /tmp/videoarm_memory.json as single source-of-truth and to spawn isolated sub-agents via sessions_spawn. It does not instruct reading arbitrary system files or exfiltration endpoints. The memory file usage and sub-agent dispatch are explicit and scoped to the skill's purpose.
Install Mechanism
There is no install spec in the skill manifest (instruction-only), but the bundle includes a full Python package (pyproject.toml, CLI scripts, requirements). That means the package will not be auto-installed by the platform; manual installation is required to get dependencies (opencv, faster-whisper, ffmpeg, yt-dlp). This is reasonable but increases the chance users will miss required system binaries or optional components. No suspicious remote download URLs or archive extraction were found in the install artifacts.
Credentials
The manifest lists no required environment variables or primary credential, yet the code and docs read/expect multiple credential-like env vars (WHISPER_API_KEY, WHISPER_BASE_URL, VISION_API_KEY/OPENAI_API_KEY, ANTHROPIC_API_KEY, HTTPS_PROXY, VIDEOARM_SESSION_ID). In particular, videoarm_audio.py currently requires WHISPER_API_KEY and will return an error if it is not set, contradicting README statements about local faster-whisper working without API keys. Asking for API keys or base URLs (and implicitly supporting OpenAI/Anthropic/Groq endpoints) is reasonable for optional cloud transcription/vision backends, but the skill's manifest does not declare these needs and the code will attempt network API calls when an API key/base URL is supplied — so do not provide secrets until you confirm which backend (local vs remote) will be used.
Persistence & Privilege
The skill writes logs and cache under ~/.videoarm and creates files under ~/.openclaw/workspace/tmp and /tmp/videoarm_memory.json. The provided cleaning tool (videoarm-clean) can delete files in ~/.openclaw/workspace/tmp and the VideoARM memory file; that may remove other workspace artifacts if run with broad arguments. The skill does not set always:true and does not modify other skills' configs, but its file I/O footprint in user home and OpenClaw workspace is significant and could affect other local agent state if cleaning tools are used carelessly.
Scan Findings in Context
[pre-scan-injection-signals] expected: No pre-scan injection signals were detected. However static absence of findings does not remove the factual contradictions between manifest and code described above (e.g., undeclared env vars and binary expectations).
What to consider before installing
This skill appears to implement a real video QA system, but there are important mismatches and operational risks you should consider before installing or running it:
- Credentials & env vars: The skill bundle and docs mention both local Whisper (faster-whisper) and remote transcription APIs, but videoarm_audio.py currently requires WHISPER_API_KEY (and will call a remote transcription endpoint) unless you run a local whisper server. Do NOT paste your OpenAI/Anthropic/Groq API keys into environment variables for this skill until you confirm whether the skill will use a local model or an external API. Prefer testing first with a non-sensitive account or an isolated environment.
- Missing declared requirements: The manifest declares no required binaries or env vars, yet the code needs ffmpeg (required), optionally yt-dlp (for downloads), and Python packages (opencv, faster-whisper). Install and test in a sandbox or VM and run videoarm-doctor to verify dependencies before giving the skill access to important files.
- Local file writes & cleanup: The skill creates ~/.videoarm and writes logs and cached videos; its cleaner can delete ~/.openclaw/workspace/tmp — that could remove other OpenClaw workspace files. If you care about other workspace data, avoid running the cleaner with broad flags or inspect the cleaner code first.
- Data exposure via sub-agents: The orchestrator spawns sub-agents and writes frame-grid images to workspace tmp for those sub-agents to read. If the video contains sensitive content you do not want shared with remote models, ensure the sub-agent/image tools operate locally and that no remote vision/transcription endpoints are configured.
What to do next:
1. Inspect videoarm_audio.py and videoarm_local_whisper to confirm whether transcription runs locally or requires an API key in your deployment. 2. Run videoarm-doctor in a safe environment to see what dependencies are missing. 3. If you must provide API keys, create scoped/test keys and run in an isolated account. 4. If you want to use only local models, confirm the local server path and disable WHISPER_API_KEY/BASE_URL. 5. Consider running the skill inside a disposable container/VM to validate behavior and filesystem changes before using on your regular workstation.Like a lobster shell, security has layers — review code before you run it.
latestvk977wbpr7qjh4k3ycb5a5e2c1183xmm5
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
