Screen Vision

Security checks across malware telemetry and agentic risk

Overview

The skill’s desktop-control purpose is clear, but it has unsafe defaults and weak safeguards for a tool that can view and operate a live computer.

Install only in an isolated desktop or VM, avoid sensitive accounts/screens, change or disable the VNC setup before use, prefer a local or trusted vision provider, disable screenshot logging if not needed, and do not rely on the advertised confirmation safeguards for destructive or financial actions.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain

Findings (20)

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: cmd = f"bash {script} {output_path}" if display_id: cmd = f"SV_DISPLAY={display_id} {cmd}" ret = os.system(cmd) return ret == 0 and os.path.exists(output_path)
Confidence: 97% confidence
Finding: ret = os.system(cmd)

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: # Save screenshot log if safety_config.get("screenshot_log", True): log_ss = os.path.join(log_dir, f"step_{action_count:03d}.png") os.system(f"cp {ss_path} {log_ss} 2>/dev/null") # Wait and take new screenshot time.sleep(wait_after)
Confidence: 88% confidence
Finding: os.system(f"cp {ss_path} {log_ss} 2>/dev/null")

Tainted flow: 'cmd' from os.environ.get (line 33, credential/environment) → os.system (code execution)

Medium

Category: Data Flow
Content: cmd = f"bash {script} {output_path}" if display_id: cmd = f"SV_DISPLAY={display_id} {cmd}" ret = os.system(cmd) return ret == 0 and os.path.exists(output_path)
Confidence: 95% confidence
Finding: ret = os.system(cmd)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill advertises and instructs use of shell commands, file writes, environment variables, and network access, yet does not declare permissions. In a desktop-control skill, these capabilities materially expand what the agent can do and prevent meaningful consent, review, or sandboxing by the host platform.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The headless setup configures a VNC server with a hardcoded password of 'screen123' and starts it with '-localhost no', exposing remote desktop access on the network. In the context of a screen-control skill, this creates an easily guessable, externally reachable entry point that can give an attacker full interactive control of the desktop and anything accessible from it.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The README prominently advertises screenshot capture, remote viewing, and saved screenshot logs, and also documents use of third-party vision APIs, but it does not clearly warn that screen contents may contain sensitive data and may be transmitted off-host or retained in local logs. For a desktop-control skill, this omission is security-relevant because users may expose credentials, personal data, internal documents, or regulated information without realizing the privacy implications.

Vague Triggers

High

Confidence: 96% confidence
Finding: The trigger scope is extremely broad, covering common requests like opening apps, browsing, taking screenshots, and 'help me operate the computer.' That makes accidental or silent activation likely, especially because this skill can control mouse/keyboard and interact with sensitive on-screen content.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The installer accepts an API key on the command line and writes it into config.json on disk without a clear warning, consent prompt, or file-permission hardening. In this screen-control skill, that credential may grant access to a vision/model provider and could be exposed through shell history, local file reads, backups, or multi-user systems, increasing the risk of account misuse and indirect compromise of a highly privileged automation capability.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation explicitly shows that screenshots are sent as base64 image data to external vision APIs, but it does not clearly warn users that screen contents may include sensitive data such as credentials, emails, documents, tokens, or personal information. In the context of a desktop-control skill, this omission is more serious because the skill is specifically designed to capture and analyze arbitrary screen contents, increasing the likelihood of unintentional disclosure to third parties.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: These examples normalize powerful desktop actions such as file copying, terminal command execution, form submission, downloads, and screenshots without any visible consent, confirmation, privacy warning, or boundary guidance. In a screen-control skill, omission of such safeguards is dangerous because users and downstream agents may treat these examples as approved operating patterns, increasing the chance of sensitive data exposure, destructive actions, or unintended system changes.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The Linux guide provides concrete commands for screenshot capture and mouse/keyboard injection on a live display without any warning about privacy exposure, unintended interaction with active user sessions, or the risks of operating against the wrong DISPLAY. In the context of a computer-control skill, this omission materially increases the chance of unsafe deployment and misuse because these commands directly enable surveillance and system manipulation.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The macOS section documents screen capture and GUI control commands while only mentioning required permissions, not the privacy and integrity risks those permissions create. Because Accessibility and Screen Recording permissions grant powerful access to observe and control the desktop, failing to warn about sensitive-data capture and unintended input injection is dangerous in a skill explicitly designed for autonomous computer control.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The Windows examples show pyautogui screenshots and input automation without warning that these actions operate on the active desktop and can interfere with a live user session or expose sensitive screen contents. In a remote-control/desktop-agent skill, such omissions are especially risky because users may treat the examples as safe defaults for production or shared systems.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code enables screenshot logging by default, storing potentially sensitive on-screen data under /tmp without any visible consent, warning, retention control, or access-hardening in this file. In a desktop-control skill, screenshots may contain credentials, personal data, internal documents, and other secrets, making this especially dangerous.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The runner sends screenshots to an external vision API for analysis without any user-facing disclosure or consent mechanism in this file. Because this skill is specifically designed to observe and control a live desktop, transmitting screen contents off-box can expose highly sensitive information to third-party services.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This file is explicitly designed to perform desktop control actions such as clicking, typing, key presses, scrolling, and dragging, yet it contains no local confirmation, policy gate, or user-visible safeguard before mutating the active GUI state. In the context of an agent skill for autonomous computer control, that makes accidental or malicious destructive actions materially more dangerous because the code directly bridges model output to real system interaction.

Missing User Warnings

Medium

Confidence: 79% confidence
Finding: This script captures screenshots and writes them to a predictable filesystem path without any user-facing confirmation, consent check, or access control in the script itself. In the context of a desktop-control skill, screenshots can contain highly sensitive information such as credentials, messages, tokens, and personal data, so silent disk persistence increases privacy and data-exposure risk.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The function sends a full screenshot plus task and history text to a remote API endpoint defined by configuration, which can expose sensitive on-screen data, credentials, personal information, or internal business data to a third party. In a screen-control skill, this risk is elevated because screenshots may contain highly sensitive transient desktop content, and the code itself provides no built-in disclosure, consent gate, redaction, or destination allowlist.

External Transmission

Medium

Category: Data Exfiltration
Content: ```json { "vision": { "baseUrl": "https://api.siliconflow.cn/v1", "apiKey": "sk-your-key", "model": "Qwen/Qwen3-VL-32B" }
Confidence: 88% confidence
Finding: https://api.siliconflow.cn/

Autonomous Decision Making

Medium

Category: Excessive Agency
Content: name: screen-vision description: > AI screen vision and desktop computer control skill for OpenClaw. Let your AI agent see the screen, understand UI elements, and autonomously perform mouse and keyboard operations (click, type, scroll, drag) via a screenshot-analyze-action loop. Cross-platform: Linux (headless server with XFCE4+noVNC, or desktop), macOS (cliclick), Windows (pyautogui). Supports any OpenAI-compatible vision API (SiliconFlow, OpenAI,
Confidence: 90% confidence
Finding: autonomously perform

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal