Video Reader

Security checks across static analysis, malware telemetry, and agentic risk

Overview

Video Reader is mostly purpose-aligned for video analysis, but it under-discloses cloud transcription/credential use and uses shared local memory and cleanup behavior that can expose or alter user data.

Review the transcription path before using this skill with private videos. If you install it, prefer a clearly configured local Whisper backend, check and clean ~/.videoarm logs and /tmp/videoarm_memory.json after sensitive tasks, and avoid running videoarm-clean unless you first inspect what it will delete.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

A video's audio may be sent to a third-party transcription provider under the user's API key, which can expose private speech or media content.

Why it was flagged

The audio tool uses an environment credential and uploads extracted audio to a cloud transcription endpoint, even though the registry metadata declares no env vars or primary credential.

Skill content
api_key = os.environ.get("WHISPER_API_KEY", "") ... base_url = os.environ.get("WHISPER_BASE_URL", "https://api.groq.com/openai/v1") ... requests.post(... headers={"Authorization": f"Bearer {api_key}"}, files={"file": ("audio.wav", audio_file, "audio/wav")})
Recommendation

Declare WHISPER_API_KEY and WHISPER_BASE_URL in metadata, ask before cloud transcription, and make local transcription the actual default if advertised.

What this means

Users may believe transcription stays local and requires no credential when the reviewed CLI path actually expects a cloud transcription key.

Why it was flagged

This claim is materially at odds with the provided audio CLI, which returns an error when WHISPER_API_KEY is unset and defaults to a cloud API endpoint.

Skill content
**Audio transcription works out of the box** — `faster-whisper` is included as a default dependency. No API keys needed.
Recommendation

Align documentation with implementation and clearly label when audio is processed locally versus uploaded to a provider.

What this means

Other local processes or sessions could read or modify this shared memory file, potentially leaking video details or steering the agent's answer.

Why it was flagged

A fixed, predictable /tmp memory file stores video paths, questions, transcript snippets, and analysis results, and the agent is instructed to trust it as authoritative.

Skill content
Read `/tmp/videoarm_memory.json` at the start of each turn ... The memory file is your single source of truth.
Recommendation

Use a per-session private memory file with restrictive permissions, validate its contents, and expire or clean it after each task.

What this means

Private video metadata or small transcription excerpts may remain on disk after analysis.

Why it was flagged

Persistent tracing is purpose-aligned for debugging, but the logs can retain video paths, URLs, and short result or transcript snippets.

Skill content
Every tool call gets logged to: ~/.videoarm/logs/YYYY-MM-DD.jsonl ... ~/.videoarm/logs/YYYY-MM-DD.log
Recommendation

Document log retention clearly and provide an easy way to disable or purge logs for sensitive videos.

What this means

If invoked, cleanup could remove unrelated temporary images used by other tasks or skills.

Why it was flagged

The cleanup tool deletes all image files in a shared OpenClaw tmp directory without verifying they were created by VideoARM.

Skill content
workspace_tmp = Path.home() / ".openclaw" / "workspace" / "tmp" ... targets.extend(_collect_files([workspace_tmp], ["*.jpg", "*.jpeg", "*.png"])) ... f.unlink()
Recommendation

Restrict cleanup to files with a VideoARM-owned prefix or manifest, and make dry-run or explicit confirmation the default for broad deletes.

What this means

Frame images and question context may be visible to child agents during analysis.

Why it was flagged

Delegating selected frames and transcript context to sub-agents is central to the design and is disclosed, but it does move user video content across agent contexts.

Skill content
Spawns sub-agent(s) with: Image path ... Specific question ... Relevant context (transcript excerpt, options)
Recommendation

Use the skill only with videos and transcript content you are comfortable sharing with OpenClaw sub-agent contexts.

What this means

Users may need to install code and dependencies outside the registry's declared install mechanism.

Why it was flagged

The registry lists no install spec, but the documentation expects a manually cloned Python package with dependencies. This is normal for a tool skill, but provenance and dependency versions are not enforced by the registry metadata.

Skill content
git clone https://github.com/qiankemeng/VideoARM-skill.git ... pip install -e ".[all]"
Recommendation

Publish a complete install spec or lockfile and ensure the registry metadata matches the actual setup requirements.