visual-rpa-skill
SuspiciousAudited by ClawScan on May 10, 2026.
Overview
This skill is a coherent desktop-automation tool, but it gives the agent broad screen-control authority, sends full-screen screenshots to an external vision provider, and explicitly runs multi-step actions without per-step confirmation.
Only install this if you are comfortable granting the agent broad visual control over your desktop. Before using it, close sensitive windows, avoid tasks involving money/account changes/deletions unless you supervise closely, require confirmation before sending messages or submitting forms, and periodically delete the ./rpa_logs/ folder.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A mistaken instruction, model misread, or unintended task decomposition could click the wrong UI element, type into the wrong app, send messages, or change local/application state before the user can review each step.
The skill directs the agent to perform multi-step desktop actions without per-step confirmation, even though the supported actions can mutate apps and accounts by clicking, typing, pressing hotkeys, and sending messages.
> Auto-execute all steps without waiting for user confirmation between steps.
Use only for clearly specified, low-risk desktop tasks; require explicit confirmation before sending messages, submitting forms, deleting/modifying data, or pressing impactful hotkeys.
Sensitive on-screen information such as chats, documents, account details, or notifications may be transmitted to the external vision provider during automation.
The script sends base64-encoded screenshots to the DashScope/Qwen vision API for analysis. Because screenshots are full-screen captures, they may include unrelated private information visible on the desktop.
BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1" ... {"type": "image_url", "image_url": {"url": f"data:{media_type};base64,{img_b64}"}}Close or hide sensitive windows before use, and prefer a version that scopes captures to a selected window or asks before sending screenshots externally.
Private screen contents may remain on disk after the automation finishes and could be viewed later by other processes or users with access to the workspace.
The skill persists screenshots and logs locally. The artifacts do not describe retention limits, cleanup, redaction, or access controls for those captured desktop images.
Logs and screenshots saved in `./rpa_logs/` directory for debugging
Inspect and delete ./rpa_logs/ after use, avoid running it while sensitive information is visible, and add retention/redaction controls if maintaining the skill.
Installing users may not realize the skill needs a DashScope API key and will use that account for image-analysis calls.
The skill needs a provider API key for the vision model, but registry metadata lists no required environment variables or primary credential. This appears purpose-aligned but under-declared.
Requires `DASHSCOPE_API_KEY` environment variable to be set.
Declare DASHSCOPE_API_KEY in metadata and use a limited, dedicated API key where possible.
Users may install different package versions or packages from the public index without reproducible pinning.
The script documents manual installation of unpinned Python packages while the registry provides no install spec. This is not malicious by itself, but it leaves dependency provenance/versioning to the user environment.
依赖安装:
pip install mss pyautogui openai pillowProvide a pinned requirements file or install specification and document supported versions.
