pipx-desktop-agent

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it gives an agent broad desktop-control and screen-reading power and has an unsafe Windows app-launch path that warrants review before installation.

Install only if you intentionally want an agent to control your desktop, type/click in apps, take screenshots, read on-screen text, list windows, and launch or focus applications. Avoid using it on sessions with sensitive data visible, do not pass untrusted app names or arguments to app open, and prefer a sandboxed desktop or test account until the shell/AppleScript injection issues are fixed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (15)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if current_platform == "windows":
            if args_list:
                subprocess.Popen(
                    f'start "" "{name}" {" ".join(args_list)}',
                    shell=True,
                )
Confidence
99% confidence
Finding
subprocess.Popen( f'start "" "{name}" {" ".join(args_list)}', shell=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
shell=True,
                )
            else:
                subprocess.Popen(f'start "" "{name}"', shell=True)

        elif current_platform == "macos":
            cmd = ["open", "-a", name]
Confidence
99% confidence
Finding
subprocess.Popen(f'start "" "{name}"', shell=True)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
activate
            end tell
            '''
            result = subprocess.run(
                ["osascript", "-e", script],
                capture_output=True,
                text=True,
Confidence
94% confidence
Finding
result = subprocess.run( ["osascript", "-e", script], capture_output=True, text=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"""Install project dependencies using uv."""
    print("\n📦 Installing dependencies...")
    try:
        result = subprocess.run(
            ["uv", "sync"],
            check=True,
            capture_output=True,
Confidence
86% confidence
Finding
result = subprocess.run( ["uv", "sync"], check=True, capture_output=True, text=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"""Verify the installation by running help command."""
    print("\n🔍 Verifying installation...")
    try:
        result = subprocess.run(
            ["python", "main.py", "--help"],
            capture_output=True,
            text=True,
Confidence
84% confidence
Finding
result = subprocess.run( ["python", "main.py", "--help"], capture_output=True, text=True, check=True, )

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill exposes shell-executable capability via `uvx desktop-agent ...` but declares no corresponding permissions or trust boundaries. That omission can cause an agent or reviewer to underestimate that the skill can drive desktop actions, launch apps, capture screenshots, and perform OCR through external commands.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The declared purpose says desktop automation for mouse, keyboard, and screen control, but the documented behavior materially expands into application launching, window enumeration/focus, dialog prompting, and OCR-based text extraction. That mismatch is dangerous because agents may grant or invoke the skill under a narrower mental model while it can also read on-screen data and manipulate other applications.

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The README frames the skill as limited to mouse, keyboard, and screen automation, but later documents additional capabilities to launch/focus applications and show dialogs including password prompts. This capability mismatch is dangerous because it can mislead users or agents about the real privilege and interaction surface of the tool, increasing the chance of unsafe installation or use in sensitive environments.

Context-Inappropriate Capability

Medium
Confidence
79% confidence
Finding
A password prompt feature in a desktop automation skill expands the tool from generic UI control into credential collection territory without clear justification or safety boundaries. In an agent context, this is especially risky because a skill that can request secrets from users may be used for phishing-style collection, credential capture, or exfiltration workflows under the guise of automation.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The manifest and early documentation omit that the skill can open applications, focus windows, and enumerate visible windows. In a desktop-control context, these are materially sensitive powers because they enable interaction with arbitrary apps beyond simple mouse and keyboard primitives, increasing the chance of unexpected cross-application actions.

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
The skill documentation does not clearly disclose OCR and screen text extraction in the manifest even though the skill can read all text from windows and locate text coordinates. This is particularly dangerous because OCR turns a UI automation tool into a data-exfiltration-capable sensor that can capture sensitive on-screen information such as emails, tokens, chats, or passwords shown in dialogs.

Description-Behavior Mismatch

Low
Confidence
76% confidence
Finding
The `list` command enumerates all visible windows and returns their titles, which can reveal sensitive information such as document names, websites, chats, or internal system context. In an automation agent, this expands the skill from app control into desktop reconnaissance, increasing privacy and targeting risk.

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The file adds OCR-driven text search and extraction commands that go beyond basic mouse/keyboard/screen control and materially expand the skill into screen-content inspection. In a desktop automation context, OCR can capture sensitive data visible on screen such as passwords, emails, tokens, chats, or internal documents, increasing exfiltration and surveillance risk beyond the declared scope.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The read-all-text command captures a screenshot and extracts all visible text from the screen or an active window, which directly enables bulk collection of potentially sensitive information. In this skill context, that creates a concrete exfiltration primitive that is more dangerous than ordinary screenshot control because it converts visual data into structured, easy-to-export text.

External Script Fetching

Low
Category
Supply Chain
Content
"""Provide instructions to install uv."""
    print("\n📦 To install uv, run:")
    print("   Windows (PowerShell): irm https://astral.sh/uv/install.ps1 | iex")
    print("   macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh")
    return False
Confidence
94% confidence
Finding
curl -LsSf https://astral.sh/uv/install.sh | sh

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal