Desktop Vision Automation

Security checks across malware telemetry and agentic risk

Overview

This is a real desktop automation skill, but it grants broad screen capture, global input recording, replay, and window-control powers without enough scoping or runtime safeguards.

Install only if you intentionally want a high-privilege desktop automation tool. Use it in a controlled environment, avoid running it while passwords, private messages, financial, medical, or work-confidential data are visible, review where it writes captures and scripts, and require explicit confirmation before recording, replaying actions, typing, hotkeys, or closing windows.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Dynamic attribute access via getattr()

Low

Category: Dangerous Code Execution
Content: task_params = task.get("params", {}) if hasattr(self, func_name): func = getattr(self, func_name) result = func(task_params) results.append({ "task": func_name,
Confidence: 88% confidence
Finding: func = getattr(self, func_name)

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill implements global keyboard and mouse event capture plus automated replay, which is surveillance and macro-execution capability with strong dual-use abuse potential. In a generic desktop automation skill with no explicit consent, scope limits, or sensitive-field protections, this can capture secrets and reproduce privileged actions.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill documents extensive desktop-control capabilities including mouse, keyboard, window management, recording, playback, OCR, and image matching, but provides no bounded purpose, authorization model, or use restrictions. In an agent setting, this creates a powerful generic operator that could be used to manipulate applications, capture sensitive information, or perform unintended actions across the user desktop.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The trigger list includes broad, everyday terms such as '点击', '移动', and especially '自动化', which are likely to appear in normal conversation and can cause unintended invocation of a highly privileged desktop-control skill. In this context, accidental activation is particularly dangerous because the skill can capture the screen, record audio/video, manipulate windows, and simulate keyboard/mouse input on a real Windows desktop.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill description emphasizes powerful screen capture, recording, OCR, and automation capabilities but does not present prominent up-front warnings about privacy, consent, or the risks of controlling a live desktop. Because this skill can collect sensitive on-screen information and perform impactful actions, weak warning posture increases the chance of unsafe use, covert capture, or user misunderstanding about the consequences.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The screenshot features save full desktop or region captures directly to disk, which can include passwords, personal data, documents, and other sensitive on-screen content. Without disclosure, consent prompts, or output-path controls, the skill facilitates quiet collection and persistence of sensitive visual data.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The recording logic captures global keyboard presses and mouse activity and writes the collected actions to disk, which can include credentials, private messages, and behavioral data. This creates clear keylogger-like risk, especially because capture is not constrained to a specific application or secret-safe context.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The window_state API exposes a close action that can terminate user applications without confirmation, risking data loss or workflow disruption. In an automation skill that can also activate and enumerate windows, this can be used to interfere with user activity or force application shutdowns.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation exposes full-screen, region, window, timed screenshot, and screen recording features without any privacy notice, consent requirement, or safeguards around sensitive on-screen data. These capabilities can capture passwords, personal data, confidential documents, tokens, and messages, making the skill dangerous if invoked without explicit user awareness and approval.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documented automation features include clicking, typing, hotkeys, dragging, scrolling, window activation/state changes, and replay of recorded actions, yet there is no warning that they can alter system state or trigger destructive operations. In practice, these primitives can submit forms, close windows, change settings, send messages, approve prompts, or execute harmful workflows at machine speed.

VirusTotal

3/46 vendors flagged this skill as malicious, and 43/46 flagged it as clean.

View on VirusTotal