Camera YOLO Operator | 摄像头 YOLO 操控者

Security checks across malware telemetry and agentic risk

Overview

This is a local computer-vision skill that uses a webcam or chosen video files, saves visual outputs locally, and downloads ML models, with no evidence of hidden exfiltration or destructive behavior.

Install only if you are comfortable with the skill accessing your webcam or selected local videos, downloading ML models from external sources, and saving images or annotated videos under the OpenClaw workspace. Use a virtual environment, review output directories and save limits before running, delete retained visual files when done, and avoid the chmod 666 camera-device troubleshooting command unless you fully understand and reverse the permission change.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 83% confidence
Finding: The documentation exposes broader functionality than the manifest advertises, including generic tracking and local video processing. This creates a transparency and consent problem: users may believe the skill only performs webcam YOLO/depth operations while it also supports broader surveillance-style analysis and persistent output generation.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The script claims to use cached/local models but also calls DepthAnything3.from_pretrained(args.depth_model), which can fetch model artifacts from HuggingFace at runtime. That creates unannounced network dependency and supply-chain exposure, and can leak environment/network metadata in contexts where users expect an offline local-only webcam skill.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill operates a local camera and saves captured or annotated media to disk, yet the description does not clearly warn about privacy-sensitive recording or storage. In this context, insufficient disclosure is dangerous because users may unintentionally record people or environments and leave persistent media artifacts in predictable directories.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The deployment guide instructs users to download models from the network and run webcam capture/detection scripts without any explicit notice that these steps access the camera, create local image/output artifacts, and may fetch external content. In a skill that processes live camera data, lack of disclosure meaningfully increases the risk of users enabling sensitive device access or generating stored captures without informed consent.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The skill captures webcam frames and periodically saves annotated images to disk automatically, but provides no explicit privacy warning or affirmative consent mechanism before storage. In a webcam-focused skill this behavior is functionally expected, yet saving images under a workspace path can still expose sensitive visual data if the user does not realize snapshots are being retained.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal