Local Video Understanding
v1.0.2Local video comprehension skill. Use ffmpeg to extract audio and frames, FunASR for speech recognition, and qwen3-vl for image understanding.
⭐ 1· 64·0 current·0 all-time
by@tomuiv
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
The name/description (local video understanding) matches the instructions: ffmpeg for extraction, FunASR for ASR, and qwen3-vl via Ollama for image understanding. Required tools mentioned in the README/SKILL.md are exactly what the workflow needs.
Instruction Scope
Instructions are concrete and narrowly scoped to extracting audio/frames, running FunASR in a conda env, and calling Ollama for image understanding. They do reference local files and paths (frames, audio.wav) and set a MODELSCOPE_CACHE path inside the Python snippet (hard-coded Windows path). They also mention optionally using a 'Cloud LLM API' for summaries without specifying which service—this could lead to data being sent off-device if the operator chooses to do so.
Install Mechanism
This is an instruction-only skill with no install spec or downloaded archives. The README notes that models are auto-downloaded on first use (FunASR/ModelScope and pulling qwen3-vl via Ollama), which is expected for local models but requires network access and disk space.
Credentials
No environment variables or credentials are declared. The SKILL.md does set MODELSCOPE_CACHE inside the Python snippet to a specific Windows user path (C:/Users/TOM/.cache/modelscope), which is odd and non-portable but not a secret-exfiltration pattern. The workflow may require internet for initial model downloads and the README suggests possible later use of a cloud LLM for summaries—this is the main privacy-related consideration.
Persistence & Privilege
The skill does not request always:true or any elevated/persistent platform privileges, nor does it modify other skills' configs. It is user-invocable and relies on local binaries and environments.
Scan Findings in Context
[no-findings] expected: No regex-based findings were returned; this is expected because the skill is instruction-only (no code files) and only contains shell/conda/ollama usage instructions.
Assessment
This skill appears coherent for local video processing, but review these before installing/using: 1) It runs local commands (ffmpeg, conda python, ollama) and reads/writes files (audio.wav, frames/*); run it only on machines you control. 2) Models are auto-downloaded on first use—ensure you trust the model sources and have disk/network capacity. 3) Update the MODELSCOPE_CACHE path in the Python snippet to a directory that exists for your user instead of the hard-coded C:/Users/TOM path. 4) The README mentions using a cloud LLM for summaries — avoid sending sensitive video data to cloud services unless you understand and accept the privacy implications. 5) If you need higher assurance, verify the exact FunASR and Ollama model sources and pull them manually before running.Like a lobster shell, security has layers — review code before you run it.
latestvk9768nmd64838429sbng33d9h984m3aw
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
