pipx-desktop-agent

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it gives an agent broad desktop-control and screen-reading power and has an unsafe Windows app-launch path that warrants review before installation.

Install only if you intentionally want an agent to control your desktop, type/click in apps, take screenshots, read on-screen text, list windows, and launch or focus applications. Avoid using it on sessions with sensitive data visible, do not pass untrusted app names or arguments to app open, and prefer a sandboxed desktop or test account until the shell/AppleScript injection issues are fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (15)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: if current_platform == "windows": if args_list: subprocess.Popen( f'start "" "{name}" {" ".join(args_list)}', shell=True, )
Confidence: 99% confidence
Finding: subprocess.Popen( f'start "" "{name}" {" ".join(args_list)}', shell=True, )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: shell=True, ) else: subprocess.Popen(f'start "" "{name}"', shell=True) elif current_platform == "macos": cmd = ["open", "-a", name]
Confidence: 99% confidence
Finding: subprocess.Popen(f'start "" "{name}"', shell=True)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: activate end tell ''' result = subprocess.run( ["osascript", "-e", script], capture_output=True, text=True,
Confidence: 94% confidence
Finding: result = subprocess.run( ["osascript", "-e", script], capture_output=True, text=True, )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: """Install project dependencies using uv.""" print("\n📦 Installing dependencies...") try: result = subprocess.run( ["uv", "sync"], check=True, capture_output=True,
Confidence: 86% confidence
Finding: result = subprocess.run( ["uv", "sync"], check=True, capture_output=True, text=True, )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: """Verify the installation by running help command.""" print("\n🔍 Verifying installation...") try: result = subprocess.run( ["python", "main.py", "--help"], capture_output=True, text=True,
Confidence: 84% confidence
Finding: result = subprocess.run( ["python", "main.py", "--help"], capture_output=True, text=True, check=True, )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill exposes shell-executable capability via `uvx desktop-agent ...` but declares no corresponding permissions or trust boundaries. That omission can cause an agent or reviewer to underestimate that the skill can drive desktop actions, launch apps, capture screenshots, and perform OCR through external commands.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The declared purpose says desktop automation for mouse, keyboard, and screen control, but the documented behavior materially expands into application launching, window enumeration/focus, dialog prompting, and OCR-based text extraction. That mismatch is dangerous because agents may grant or invoke the skill under a narrower mental model while it can also read on-screen data and manipulate other applications.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The README frames the skill as limited to mouse, keyboard, and screen automation, but later documents additional capabilities to launch/focus applications and show dialogs including password prompts. This capability mismatch is dangerous because it can mislead users or agents about the real privilege and interaction surface of the tool, increasing the chance of unsafe installation or use in sensitive environments.

Context-Inappropriate Capability

Medium

Confidence: 79% confidence
Finding: A password prompt feature in a desktop automation skill expands the tool from generic UI control into credential collection territory without clear justification or safety boundaries. In an agent context, this is especially risky because a skill that can request secrets from users may be used for phishing-style collection, credential capture, or exfiltration workflows under the guise of automation.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The manifest and early documentation omit that the skill can open applications, focus windows, and enumerate visible windows. In a desktop-control context, these are materially sensitive powers because they enable interaction with arbitrary apps beyond simple mouse and keyboard primitives, increasing the chance of unexpected cross-application actions.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill documentation does not clearly disclose OCR and screen text extraction in the manifest even though the skill can read all text from windows and locate text coordinates. This is particularly dangerous because OCR turns a UI automation tool into a data-exfiltration-capable sensor that can capture sensitive on-screen information such as emails, tokens, chats, or passwords shown in dialogs.

Description-Behavior Mismatch

Low

Confidence: 76% confidence
Finding: The `list` command enumerates all visible windows and returns their titles, which can reveal sensitive information such as document names, websites, chats, or internal system context. In an automation agent, this expands the skill from app control into desktop reconnaissance, increasing privacy and targeting risk.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The file adds OCR-driven text search and extraction commands that go beyond basic mouse/keyboard/screen control and materially expand the skill into screen-content inspection. In a desktop automation context, OCR can capture sensitive data visible on screen such as passwords, emails, tokens, chats, or internal documents, increasing exfiltration and surveillance risk beyond the declared scope.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The read-all-text command captures a screenshot and extracts all visible text from the screen or an active window, which directly enables bulk collection of potentially sensitive information. In this skill context, that creates a concrete exfiltration primitive that is more dangerous than ordinary screenshot control because it converts visual data into structured, easy-to-export text.

External Script Fetching

Low

Category: Supply Chain
Content: """Provide instructions to install uv.""" print("\n📦 To install uv, run:") print(" Windows (PowerShell): irm https://astral.sh/uv/install.ps1 | iex") print(" macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh") return False
Confidence: 94% confidence
Finding: curl -LsSf https://astral.sh/uv/install.sh | sh

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal