Video Reader
Security checks across static analysis, malware telemetry, and agentic risk
Overview
Video Reader is mostly purpose-aligned for video analysis, but it under-discloses cloud transcription/credential use and uses shared local memory and cleanup behavior that can expose or alter user data.
Review the transcription path before using this skill with private videos. If you install it, prefer a clearly configured local Whisper backend, check and clean ~/.videoarm logs and /tmp/videoarm_memory.json after sensitive tasks, and avoid running videoarm-clean unless you first inspect what it will delete.
Static analysis
No static analysis findings were reported for this release.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A video's audio may be sent to a third-party transcription provider under the user's API key, which can expose private speech or media content.
The audio tool uses an environment credential and uploads extracted audio to a cloud transcription endpoint, even though the registry metadata declares no env vars or primary credential.
api_key = os.environ.get("WHISPER_API_KEY", "") ... base_url = os.environ.get("WHISPER_BASE_URL", "https://api.groq.com/openai/v1") ... requests.post(... headers={"Authorization": f"Bearer {api_key}"}, files={"file": ("audio.wav", audio_file, "audio/wav")})Declare WHISPER_API_KEY and WHISPER_BASE_URL in metadata, ask before cloud transcription, and make local transcription the actual default if advertised.
Users may believe transcription stays local and requires no credential when the reviewed CLI path actually expects a cloud transcription key.
This claim is materially at odds with the provided audio CLI, which returns an error when WHISPER_API_KEY is unset and defaults to a cloud API endpoint.
**Audio transcription works out of the box** — `faster-whisper` is included as a default dependency. No API keys needed.
Align documentation with implementation and clearly label when audio is processed locally versus uploaded to a provider.
Other local processes or sessions could read or modify this shared memory file, potentially leaking video details or steering the agent's answer.
A fixed, predictable /tmp memory file stores video paths, questions, transcript snippets, and analysis results, and the agent is instructed to trust it as authoritative.
Read `/tmp/videoarm_memory.json` at the start of each turn ... The memory file is your single source of truth.
Use a per-session private memory file with restrictive permissions, validate its contents, and expire or clean it after each task.
Private video metadata or small transcription excerpts may remain on disk after analysis.
Persistent tracing is purpose-aligned for debugging, but the logs can retain video paths, URLs, and short result or transcript snippets.
Every tool call gets logged to: ~/.videoarm/logs/YYYY-MM-DD.jsonl ... ~/.videoarm/logs/YYYY-MM-DD.log
Document log retention clearly and provide an easy way to disable or purge logs for sensitive videos.
If invoked, cleanup could remove unrelated temporary images used by other tasks or skills.
The cleanup tool deletes all image files in a shared OpenClaw tmp directory without verifying they were created by VideoARM.
workspace_tmp = Path.home() / ".openclaw" / "workspace" / "tmp" ... targets.extend(_collect_files([workspace_tmp], ["*.jpg", "*.jpeg", "*.png"])) ... f.unlink()
Restrict cleanup to files with a VideoARM-owned prefix or manifest, and make dry-run or explicit confirmation the default for broad deletes.
Frame images and question context may be visible to child agents during analysis.
Delegating selected frames and transcript context to sub-agents is central to the design and is disclosed, but it does move user video content across agent contexts.
Spawns sub-agent(s) with: Image path ... Specific question ... Relevant context (transcript excerpt, options)
Use the skill only with videos and transcript content you are comfortable sharing with OpenClaw sub-agent contexts.
Users may need to install code and dependencies outside the registry's declared install mechanism.
The registry lists no install spec, but the documentation expects a manually cloned Python package with dependencies. This is normal for a tool skill, but provenance and dependency versions are not enforced by the registry metadata.
git clone https://github.com/qiankemeng/VideoARM-skill.git ... pip install -e ".[all]"
Publish a complete install spec or lockfile and ensure the registry metadata matches the actual setup requirements.
