Screen Vision

Security checks across malware telemetry and agentic risk

Overview

The skill’s desktop-control purpose is clear, but it has unsafe defaults and weak safeguards for a tool that can view and operate a live computer.

Install only in an isolated desktop or VM, avoid sensitive accounts/screens, change or disable the VNC setup before use, prefer a local or trusted vision provider, disable screenshot logging if not needed, and do not rely on the advertised confirmation safeguards for destructive or financial actions.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Findings (20)

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
cmd = f"bash {script} {output_path}"
    if display_id:
        cmd = f"SV_DISPLAY={display_id} {cmd}"
    ret = os.system(cmd)
    return ret == 0 and os.path.exists(output_path)
Confidence
97% confidence
Finding
ret = os.system(cmd)

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
# Save screenshot log
        if safety_config.get("screenshot_log", True):
            log_ss = os.path.join(log_dir, f"step_{action_count:03d}.png")
            os.system(f"cp {ss_path} {log_ss} 2>/dev/null")
        
        # Wait and take new screenshot
        time.sleep(wait_after)
Confidence
88% confidence
Finding
os.system(f"cp {ss_path} {log_ss} 2>/dev/null")

Tainted flow: 'cmd' from os.environ.get (line 33, credential/environment) → os.system (code execution)

Medium
Category
Data Flow
Content
cmd = f"bash {script} {output_path}"
    if display_id:
        cmd = f"SV_DISPLAY={display_id} {cmd}"
    ret = os.system(cmd)
    return ret == 0 and os.path.exists(output_path)
Confidence
95% confidence
Finding
ret = os.system(cmd)

Lp3

Medium
Category
MCP Least Privilege
Confidence
94% confidence
Finding
The skill advertises and instructs use of shell commands, file writes, environment variables, and network access, yet does not declare permissions. In a desktop-control skill, these capabilities materially expand what the agent can do and prevent meaningful consent, review, or sandboxing by the host platform.

Intent-Code Divergence

High
Confidence
99% confidence
Finding
The headless setup configures a VNC server with a hardcoded password of 'screen123' and starts it with '-localhost no', exposing remote desktop access on the network. In the context of a screen-control skill, this creates an easily guessable, externally reachable entry point that can give an attacker full interactive control of the desktop and anything accessible from it.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The README prominently advertises screenshot capture, remote viewing, and saved screenshot logs, and also documents use of third-party vision APIs, but it does not clearly warn that screen contents may contain sensitive data and may be transmitted off-host or retained in local logs. For a desktop-control skill, this omission is security-relevant because users may expose credentials, personal data, internal documents, or regulated information without realizing the privacy implications.

Vague Triggers

High
Confidence
96% confidence
Finding
The trigger scope is extremely broad, covering common requests like opening apps, browsing, taking screenshots, and 'help me operate the computer.' That makes accidental or silent activation likely, especially because this skill can control mouse/keyboard and interact with sensitive on-screen content.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The installer accepts an API key on the command line and writes it into config.json on disk without a clear warning, consent prompt, or file-permission hardening. In this screen-control skill, that credential may grant access to a vision/model provider and could be exposed through shell history, local file reads, backups, or multi-user systems, increasing the risk of account misuse and indirect compromise of a highly privileged automation capability.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documentation explicitly shows that screenshots are sent as base64 image data to external vision APIs, but it does not clearly warn users that screen contents may include sensitive data such as credentials, emails, documents, tokens, or personal information. In the context of a desktop-control skill, this omission is more serious because the skill is specifically designed to capture and analyze arbitrary screen contents, increasing the likelihood of unintentional disclosure to third parties.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
These examples normalize powerful desktop actions such as file copying, terminal command execution, form submission, downloads, and screenshots without any visible consent, confirmation, privacy warning, or boundary guidance. In a screen-control skill, omission of such safeguards is dangerous because users and downstream agents may treat these examples as approved operating patterns, increasing the chance of sensitive data exposure, destructive actions, or unintended system changes.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The Linux guide provides concrete commands for screenshot capture and mouse/keyboard injection on a live display without any warning about privacy exposure, unintended interaction with active user sessions, or the risks of operating against the wrong DISPLAY. In the context of a computer-control skill, this omission materially increases the chance of unsafe deployment and misuse because these commands directly enable surveillance and system manipulation.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The macOS section documents screen capture and GUI control commands while only mentioning required permissions, not the privacy and integrity risks those permissions create. Because Accessibility and Screen Recording permissions grant powerful access to observe and control the desktop, failing to warn about sensitive-data capture and unintended input injection is dangerous in a skill explicitly designed for autonomous computer control.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The Windows examples show pyautogui screenshots and input automation without warning that these actions operate on the active desktop and can interfere with a live user session or expose sensitive screen contents. In a remote-control/desktop-agent skill, such omissions are especially risky because users may treat the examples as safe defaults for production or shared systems.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The code enables screenshot logging by default, storing potentially sensitive on-screen data under /tmp without any visible consent, warning, retention control, or access-hardening in this file. In a desktop-control skill, screenshots may contain credentials, personal data, internal documents, and other secrets, making this especially dangerous.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The runner sends screenshots to an external vision API for analysis without any user-facing disclosure or consent mechanism in this file. Because this skill is specifically designed to observe and control a live desktop, transmitting screen contents off-box can expose highly sensitive information to third-party services.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
This file is explicitly designed to perform desktop control actions such as clicking, typing, key presses, scrolling, and dragging, yet it contains no local confirmation, policy gate, or user-visible safeguard before mutating the active GUI state. In the context of an agent skill for autonomous computer control, that makes accidental or malicious destructive actions materially more dangerous because the code directly bridges model output to real system interaction.

Missing User Warnings

Medium
Confidence
79% confidence
Finding
This script captures screenshots and writes them to a predictable filesystem path without any user-facing confirmation, consent check, or access control in the script itself. In the context of a desktop-control skill, screenshots can contain highly sensitive information such as credentials, messages, tokens, and personal data, so silent disk persistence increases privacy and data-exposure risk.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The function sends a full screenshot plus task and history text to a remote API endpoint defined by configuration, which can expose sensitive on-screen data, credentials, personal information, or internal business data to a third party. In a screen-control skill, this risk is elevated because screenshots may contain highly sensitive transient desktop content, and the code itself provides no built-in disclosure, consent gate, redaction, or destination allowlist.

External Transmission

Medium
Category
Data Exfiltration
Content
```json
{
  "vision": {
    "baseUrl": "https://api.siliconflow.cn/v1",
    "apiKey": "sk-your-key",
    "model": "Qwen/Qwen3-VL-32B"
  }
Confidence
88% confidence
Finding
https://api.siliconflow.cn/

Autonomous Decision Making

Medium
Category
Excessive Agency
Content
name: screen-vision
description: >
  AI screen vision and desktop computer control skill for OpenClaw.
  Let your AI agent see the screen, understand UI elements, and autonomously perform
  mouse and keyboard operations (click, type, scroll, drag) via a screenshot-analyze-action loop.
  Cross-platform: Linux (headless server with XFCE4+noVNC, or desktop), macOS (cliclick),
  Windows (pyautogui). Supports any OpenAI-compatible vision API (SiliconFlow, OpenAI,
Confidence
90% confidence
Finding
autonomously perform

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal