visual-rpa-skill

Security checks across malware telemetry and agentic risk

Overview

This is a real desktop automation skill, but it gives broad no-confirmation control over the user's screen, keyboard, messages, screenshots, and logs.

Install only if you are comfortable supervising a tool that can operate your desktop directly. Avoid using it on screens with passwords, private chats, financial data, customer records, terminals, or admin consoles; do not use --no-verify for real tasks; review and delete ./rpa_logs/ after use; and use a dedicated Dashscope API key if possible.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (15)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill invokes shell execution and reads environment variables, but the documentation does not declare permissions or boundaries for those capabilities. This creates a transparency and governance gap: operators may authorize or trigger a skill without understanding that it can execute commands and access sensitive environment-backed secrets such as API keys.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The typing path copies arbitrary user-provided text into the system clipboard before pasting it, which can expose secrets to other applications, clipboard history tools, remote desktop sync, or later user actions. In an RPA tool that may handle passwords, messages, or private business data, this creates a clear unintended disclosure channel.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill sends screenshots and task instructions to a remote model API, which can contain sensitive on-screen information such as chats, documents, credentials, customer data, or internal systems. Because the skill description emphasizes desktop automation but does not clearly disclose remote data transmission, users may expose sensitive information without informed consent.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The description is broadly framed to operate desktop apps, click icons, type text, and send messages across arbitrary applications, which makes accidental or overly permissive invocation more likely. In a high-impact automation context, vague activation language can cause the agent to perform sensitive actions in the wrong app or on the wrong content.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The guidance encourages free-form natural language and compound instructions that the system will decompose automatically, without describing trigger constraints or safety gates. This increases the chance that ambiguous user requests are translated into unintended desktop actions with real-world side effects.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The instruction to auto-execute all steps without waiting for confirmation enables autonomous multi-step desktop control with no human checkpoint. In a visual RPA skill, this is especially dangerous because misidentification, prompt ambiguity, or screen changes can propagate into unintended clicks, typing, message sends, or system modifications.

Missing User Warnings

High

Confidence: 94% confidence
Finding: The skill states it uses screen capture and an external vision model, but provides no privacy warning about collecting on-screen data or transmitting it to a third-party service. Screenshots may contain credentials, personal messages, financial data, or confidential documents, making silent capture and external processing a significant confidentiality risk.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: The examples demonstrate sending messages and modifying the desktop state without any warning about consent, target validation, or unintended side effects. These examples normalize risky behavior and make it more likely that downstream users or agents will treat sensitive actions as routine and safe to automate.

Missing User Warnings

High

Confidence: 99% confidence
Finding: Full or partial screen captures are transmitted off-device to an external vision API without an in-tool warning or consent flow. In the context of a desktop automation skill, screen contents routinely include highly sensitive information, making silent exfiltration to a remote service especially dangerous.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The tool configures persistent logging and screenshot storage in a local directory without warning the user that sensitive screen contents and task traces may be retained on disk. This increases the chance of later disclosure through shared machines, backups, incident response collection, or other local users/processes.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The logging setup writes operational details to disk, and elsewhere the program logs user instructions that may contain natural-language secrets or sensitive business actions. Persistent plaintext traces of what the user asked the automation to do create an avoidable leakage path even without a direct compromise of the automation engine.

Ssd 3

Medium

Confidence: 98% confidence
Finding: Each execution step saves thumbnail, crop, and after-action screenshots to disk, creating a durable record of whatever was visible on screen during automation. Since this is a general desktop RPA tool, routine use could capture emails, chats, financial data, credentials, or proprietary material, making the leakage risk substantial.

Autonomous Decision Making

Medium

Category: Excessive Agency
Content: # Visual RPA Desktop Automation > Auto-execute all steps without waiting for user confirmation between steps. Desktop automation via screen capture + Qwen vision model (Qwen-VL). No DOM or accessibility API needed.
Confidence: 93% confidence
Finding: Auto-execute

Tool Parameter Abuse

High

Category: Tool Misuse
Content: ### Skip verification (faster) ``` python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --no-verify --task "click to open Calculator" ``` ### Parameters
Confidence: 96% confidence
Finding: --no-verify

Tool Parameter Abuse

High

Category: Tool Misuse
Content: | `--mode task` | Batch task mode (required) | | `--mode interactive` | Interactive mode (default) | | `--task "step1" "step2"` | Task instructions, supports multiple | | `--no-verify` | Skip post-action verification | | `--model MODEL` | Vision model name (default: qwen-vl-max-latest) | | `--api-key KEY` | API Key (defaults to DASHSCOPE_API_KEY env var) |
Confidence: 95% confidence
Finding: --no-verify

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal