Security audit

PyAutoGUI Controller

Security checks across malware telemetry and agentic risk

Overview

This is a real desktop/browser automation skill, but it can launch local programs, control live windows, reuse browser sessions, and persist screenshots/logs with limited user-facing safeguards.

Install only if you are comfortable giving the skill control of your Windows desktop and browser. Use it in a separate browser profile or test environment, avoid running it while sensitive apps or accounts are open, do not set custom bridge/CDP environment variables unless you trust them, and periodically clear the runtime screenshots, failure logs, and browser profile/state directories.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (34)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: "text": text, } try: out = subprocess.check_output(command + [json.dumps(payload, ensure_ascii=False)], stderr=subprocess.STDOUT, timeout=20) data = json.loads(out.decode("utf-8", errors="replace")) return DOMLocateResult(bool(data.get("success")), detail=str(data.get("detail", "")), selector=data.get("selector"), extra=data) except subprocess.CalledProcessError as exc:
Confidence: 86% confidence
Finding: out = subprocess.check_output(command + [json.dumps(payload, ensure_ascii=False)], stderr=subprocess.STDOUT, timeout=20)

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The resolver can launch executables, batch files, command scripts, desktop shortcuts, and even existing explicit paths with very weak constraints. Because resolution includes desktop items and filesystem paths, a user-controlled app name can cause execution of unintended local code, including malicious shortcuts or scripts placed on the desktop, which materially expands this skill into arbitrary code execution.

Context-Inappropriate Capability

Medium

Confidence: 76% confidence
Finding: This file is presented as a mouse-control module, but it also includes keyboard-driven window management such as minimize, maximize, and close. Broadening a UI automation skill beyond its stated purpose increases the chance of misuse, especially because these functions can redirect or disrupt user workflows and make subsequent automated clicks/keystrokes act on unintended windows.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The function advertises text/OCR-based button discovery but actually returns a guessed hard-coded coordinate. In an automation context, this can cause unintended clicks on the wrong UI element, including destructive actions such as confirmation, consent, or purchase buttons, because callers may trust the API contract and skip additional validation.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The captcha detection method claims to detect captcha challenges but always returns False. In an agent automation skill, this can lead downstream logic to continue interacting with sites under the false assumption that no human-verification gate exists, potentially bypassing safeguards or causing repeated unintended actions.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The checkbox finder returns a fixed coordinate instead of detecting a checkbox. In GUI automation, this can trigger clicks on arbitrary screen content near that location, including consent boxes, security prompts, or unrelated controls, creating integrity and safety risks.

Intent-Code Divergence

Low

Confidence: 88% confidence
Finding: The input-field locator mostly guesses from preset regions and a single-pixel color check, which is too weak to safely identify an editable field. This can cause text entry into the wrong control or non-input area, potentially leaking sensitive data or corrupting workflow state.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explicitly demonstrates browser navigation, clicking, and text entry, which are high-impact GUI automation actions, but it does not warn users that the skill can control their desktop/browser and potentially modify local or web data. In the context of a PyAutoGUI/controller skill, this omission increases the chance of unsafe use, accidental data modification, or over-trust by operators.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly enables local desktop and browser control, screenshots, clicking, and text entry on the user's machine, but it does not warn about side effects on active applications, sensitive windows, or unintended system interactions. In an agent setting, this can cause actions in the wrong window, exposure of sensitive on-screen data, or accidental modification/submission of content if invoked without strong user confirmation and safety boundaries.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code immediately executes the resolved target without any user-visible confirmation, safety prompt, or secondary validation. In a skill that can start local programs and scripts, lack of confirmation increases the chance of accidental or socially engineered execution of dangerous programs.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code reads the current clipboard, replaces it with arbitrary text, pastes it, and then restores the prior value without any user notice or consent. Even if intended to support non-ASCII input, this can expose sensitive clipboard contents in process memory and creates privacy and integrity risks because clipboard data is accessed and modified as a side effect of typing.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: This code performs automated mouse movement and clicking with no built-in user consent, visibility, or policy checks before triggering real UI actions. In an agent skill context, synthetic clicks can activate destructive or sensitive operations in other applications, making unintended or abusive interaction possible if upstream inputs are wrong or compromised.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The code captures a screenshot and writes it to disk without any visible consent, notice, minimization, or retention controls in this component. Screenshots can contain sensitive information such as credentials, personal data, messages, or other application content, and persisting them to files increases exposure if the files are later accessed, logged, or left behind.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This function performs OCR on a screenshot region and returns extracted on-screen text without any visible disclosure, scope restriction beyond the provided region, or safeguards around sensitive text handling. OCR can harvest secrets or private content from the screen, and converting images into text can make downstream exfiltration, logging, or misuse significantly easier.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The code can programmatically bring an arbitrary matching window to the foreground via `candidate.activate()` without any user confirmation, notice, or restriction on target applications. This can unexpectedly steal focus and cause subsequent keystrokes or actions to be sent to the wrong application, which is especially risky in an agent context where window selection is driven by keyword matching and may target the wrong window.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The code persistently stores learned site configuration data under the user's home directory without notice, consent, retention controls, or validation of what is being saved. In an automation skill that adapts to websites and records UI element locations, this can create privacy and operational risk by leaving behind behavioral artifacts and potentially sensitive browsing-related metadata on disk.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The replay function replays recorded clicks and keystrokes with no user confirmation, safety interlock, target-window validation, or visible disclosure. In an automation skill, this can cause unintended input into the wrong application, trigger destructive actions, or silently submit sensitive text if a hostile or stale action file is loaded.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The bridge will automatically navigate to a supplied URL and perform DOM interactions such as click and type without any user confirmation, origin allowlisting, or visibility into what site is being controlled. In an agent-skill context, this can drive actions on sensitive sites, trigger unintended transactions, or submit data if upstream inputs are attacker-controlled or ambiguous.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The code launches a persistent browser context using a profile directory, which reuses cookies, sessions, and other stored browser state without any disclosure or consent mechanism. In practice, this means the skill may operate with an already-authenticated identity and access private account data or perform authenticated actions unexpectedly.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: When a CDP URL is provided, the bridge attaches to an existing Chromium session, inheriting access to open tabs, cookies, local storage, and other active browser data without any warning or trust boundary checks. This is especially dangerous in an agent setting because it can silently act inside a user's live authenticated browsing session and expose or manipulate highly sensitive data.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: Failure recording stores the raw command, serialized intent/step/result, runtime state, and recent history to persistent storage. In an automation system that can type into apps and browse pages, these records can capture sensitive user data, internal app names, URLs, and action context without minimization or clear disclosure.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The typed-input verification path OCRs screen content after typing and returns the recovered text in execution evidence via the readback field. This can expose secrets the agent just entered, such as passwords, tokens, personal data, or confidential business text, to logs, upstream callers, or later failure records.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The method reads the current clipboard, replaces it with caller-controlled text, triggers a paste into whatever application is focused, and then restores the prior clipboard contents. This can expose sensitive clipboard data to the skill and modify another application without explicit user awareness or consent, which is risky in an automation skill because focus may be on a password field, terminal, chat app, or admin console.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: Automatically issuing Ctrl+V sends data to the currently focused application with no confirmation, which can cause unintended commands, data entry, or disclosure in the wrong context. In GUI automation, active-window ambiguity makes this more dangerous because a small focus change can redirect input to a sensitive target.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This clipboard helper also reads existing clipboard contents, replaces them, performs an automated paste, and restores the prior value without any user-facing disclosure. That creates privacy risk from accessing sensitive clipboard contents and integrity risk from injecting text into an unintended active application.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.