Desktop Agent

Security checks across malware telemetry and agentic risk

Overview

This desktop automation skill is transparent about its purpose, but it gives an agent broad unsandboxed control over the user’s screen, mouse, keyboard, screenshots, and saved task replay without enough safeguards.

Install only if you intentionally want an agent to see and control your live desktop. Use it supervised, close sensitive apps first, avoid teaching or replaying workflows involving passwords, tokens, payments, private messages, or admin settings, and review saved task JSON before running it.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (11)

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The screen capture and OCR routines can read arbitrary visible desktop content, including passwords, messages, tokens, documents, and other sensitive data, without any consent prompt, scoping, or disclosure. In an agent skill that automates the desktop, this materially increases surveillance and data-exfiltration risk because the code can observe whatever is on screen, not just app-owned content.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: Saving screenshots to an arbitrary filepath writes potentially sensitive visual data to disk without warning, increasing the chance of persistent data exposure beyond the live desktop session. If the saved file lands in an insecure or synced location, sensitive screen contents may be retained, shared, or later accessed by other users or processes.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The mouse and keyboard automation methods can perform arbitrary UI actions such as clicking dialogs, typing commands, invoking shortcuts, and interacting with privileged applications without any authorization check or confirmation step. In a desktop agent context, this can directly trigger destructive actions, credential submission, application reconfiguration, or command execution through the GUI.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The task replay feature loads action sequences from disk and executes them as clicks, typing, hotkeys, waits, screenshots, and text-driven interactions without validating integrity, authenticity, safety, or requiring user approval. This is especially dangerous because persisted task files become a local instruction channel: anyone who can modify those JSON files can cause the agent to carry out arbitrary GUI-driven actions and capture data on the user's system.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The CLI directly triggers real desktop automation actions such as clicks, keystrokes, and key presses without any confirmation prompt, safety banner, dry-run mode, or indication of which window will receive input. In a desktop agent context, these actions can affect arbitrary applications and sensitive sessions, making accidental or misuse-driven interaction with the user's system plausible.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Executing a learned task by name can replay stored desktop automation steps without warning the user that the task may perform clicks, typing, or other real GUI interactions. Because task contents are opaque at run time except for a name, users may invoke automation with side effects they do not fully understand, including actions in privileged or sensitive windows.

Missing User Warnings

Low

Confidence: 88% confidence
Finding: The teaching flow allows recording and saving automation steps without clearly disclosing that the resulting workflow will persist and may later replay real desktop actions. This is less immediately dangerous than direct execution, but it can create stored automations whose future behavior or risk is not obvious to the user.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The skill’s activation guidance is very broad and would allow it to be invoked for generic desktop control, learning, and automation requests without strong guardrails or confirmation requirements. Because the skill grants screenshot capture plus mouse and keyboard control over arbitrary applications, overly permissive triggering materially increases the chance of unintended use against sensitive apps, credentials, or user data.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The screenshot capability can capture sensitive on-screen information including credentials, personal messages, documents, tokens, and other private data, yet the skill does not clearly warn about these privacy and data-handling risks. In a desktop-control skill, this omission is especially dangerous because screenshots are a direct data collection mechanism and may be stored to disk, reused, or exposed beyond the user’s expectations.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: Teaching mode and task execution enable recording and replaying clicks, typing, hotkeys, and waits across arbitrary applications, which can modify files, send messages, submit forms, or trigger destructive actions. The lack of explicit warnings and constraints is dangerous because reusable task definitions can automate harmful sequences at scale or replay actions in the wrong application or state, causing data loss, unauthorized transactions, or leakage.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill records arbitrary user-typed input and immediately prints it back to the console in cleartext. In a task-teaching workflow, users may demonstrate login flows, API keys, personal data, or other secrets, so echoing captured input can expose sensitive information to logs, terminal history, screen recordings, or bystanders.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal