一个功能强大的屏幕浏览、OCR识别和屏幕分析技能包，专为AI助手设计

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real screen capture and OCR skill, but it needs review because it can monitor sensitive screen contents and includes a Windows installer flow that downloads and silently runs an external OCR binary without integrity verification.

Install only if you trust the publisher and are comfortable granting screen-capture capability. Prefer manual, verified Tesseract installation over the automatic Windows installer, avoid running the scripts as administrator unless necessary, use explicit capture regions and output paths, and delete saved screenshots/OCR text/logs that may contain sensitive information.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import

Findings (38)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: # 检查环境变量 try: result = subprocess.run( "tesseract --version", shell=True, capture_output=True,
Confidence: 94% confidence
Finding: result = subprocess.run( "tesseract --version", shell=True, capture_output=True, text=True, encoding='utf-8'

subprocess module call

Medium

Category: Dangerous Code Execution
Content: print("正在安装，请稍候...") # 运行安装程序 result = subprocess.run( install_command, shell=True, capture_output=True,
Confidence: 97% confidence
Finding: result = subprocess.run( install_command, shell=True, capture_output=True, text=True, encoding='utf-8' )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: print(f"命令: {cmd}") try: result = subprocess.run( cmd, shell=True, capture_output=True,
Confidence: 95% confidence
Finding: result = subprocess.run( cmd, shell=True, capture_output=True, text=True, encoding='utf-8', errors='replace'

exec() call detected

High

Category: Dangerous Code Execution
Content: # 运行测试代码 try: exec(test_code) return True except Exception as e: print(f"❌ 验证失败: {e}")
Confidence: 93% confidence
Finding: exec(test_code)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill documentation advertises and instructs use of shell execution, file read/write, environment access, and network-backed installation/update flows, but declares no permissions. That creates a transparency and consent gap: a user or orchestrator may invoke a seemingly simple screen-viewing skill without realizing it can execute commands, write files, and fetch/install software. In a screenshot/OCR context, those capabilities materially increase the blast radius because sensitive screen contents may be saved, processed, or combined with downloaded components.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The documented behavior extends beyond the stated purpose of screenshotting and OCR into dependency installation, silent downloading/installation of Tesseract and language packs, screen-state probing, image-analysis features not disclosed in the description, and packaging/publishing workflows. This mismatch is dangerous because users may consent to benign screen capture but unknowingly authorize network activity, software installation, or broader system interaction. In a privacy-sensitive skill that can access live screen content, undisclosed extra behaviors significantly raise trust and supply-chain risk.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The README explicitly advertises timed monitoring and repeated screenshot capture, which expands the skill from ad hoc screen viewing into persistent surveillance behavior. In a screen-capture skill, this increases the risk of collecting sensitive information over time without clear scope limits, consent requirements, or retention guidance.

Context-Inappropriate Capability

Medium

Confidence: 83% confidence
Finding: The README recommends running an installation script that automatically installs dependencies and OCR components, introducing a code-execution and supply-chain risk beyond the core purpose of viewing/analyzing the screen. Auto-install behavior is especially risky in agent skills because users may run setup scripts with broad system privileges without reviewing what they do.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Documenting an automatic Tesseract download/install step adds external software installation capability that is not part of ordinary screenshot/OCR usage and can expose users to unreviewed binary acquisition. In the context of an agent skill, normalizing automatic installation makes it easier to conceal unexpected system modifications or introduce supply-chain compromise paths.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: These examples add timed, repeated screenshot capture for monitoring over a duration, which goes beyond a simple user-invoked screenshot/OCR utility. In a screen-capture skill, this materially increases surveillance capability and can enable covert collection of sensitive on-screen data over time if reused without strong consent and retention controls.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The change-detection loop continuously watches the screen and automatically saves screenshots when changes occur, creating a persistent surveillance primitive not described in the stated scope. Because it triggers without per-capture user intent, it can silently accumulate sensitive visual data and behavior patterns.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The automated report example collects screen resolution, mouse position, and a screenshot, then persists them into a report file. This expands the skill from screenshot/OCR into user activity and environment profiling, which is more privacy-sensitive and not justified by the declared purpose.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This error-monitoring system repeatedly scans the screen, detects keywords, and logs findings over time, effectively implementing persistent surveillance and activity logging. In the context of a screen viewer skill, this can capture sensitive application states, internal errors, and potentially confidential text far beyond ad hoc troubleshooting.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The skill is described as a screen-viewing/OCR utility, but this script adds software installation behavior, including downloading binaries and modifying the host environment. That expands the skill's effective capability beyond passive OCR processing and increases the attack surface, especially in agent contexts where users may not expect system-level changes.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The script downloads executable content and OCR data from external sources at runtime, which introduces supply-chain risk. Because there is no integrity verification such as checksum or signature validation, a compromised mirror, redirect, or man-in-the-middle condition could result in malicious files being written and later executed.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This is the most security-significant behavior in the file: it executes a downloaded installer silently through the shell. In the context of a screenshot/OCR skill, this adds arbitrary software installation and code execution capability unrelated to the stated purpose, making compromise of the download path especially dangerous.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The setup script executes system shell commands to install packages, expanding its capabilities beyond simple screen viewing. This is security-relevant because install-time command execution can change the local environment and increases the blast radius if the script is run in trusted contexts.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Automatically modifying the user's Python environment by installing third-party packages is not inherently malicious, but it is a privileged side effect unrelated to the core runtime behavior of taking screenshots. In a skill context, users may invoke or trust the skill for screen analysis without expecting environment modification.

Intent-Code Divergence

Low

Confidence: 96% confidence
Finding: The 'verify installation' routine does more than validation: it reads screen size and captures a real screenshot. For a screen-capture skill this capability is aligned with purpose, but performing it automatically during setup without an explicit prompt creates an unnecessary privacy risk.

Missing User Warnings

Low

Confidence: 89% confidence
Finding: The installation guide explicitly mentions granting screen-recording permissions and running as administrator, but it does not clearly warn users about the security and privacy implications of doing so. For a screen-capture skill, this omission matters because the capability can expose sensitive on-screen data and elevated privileges increase the blast radius if the skill or its dependencies are misused.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README promotes screenshot, OCR, file output, and timed monitoring but provides no visible warning that these operations may capture credentials, personal messages, tokens, or other sensitive on-screen data. Lack of disclosure and safe-handling guidance increases the chance of accidental privacy violations and insecure storage of captured content.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The trigger language is broad enough to activate on general mentions of screenshots, viewing the screen, or analyzing on-screen content, which can cause this skill to run in situations where the user did not intend screen capture or OCR. Because screen capture can expose credentials, personal messages, and other sensitive data, accidental invocation is more dangerous here than for low-risk utilities. The skill context therefore amplifies the risk of over-broad triggering.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The documentation describes capturing screens, extracting text, and saving outputs without prominently warning that screenshots and OCR may collect highly sensitive information such as passwords, personal chats, financial data, or corporate secrets. In a screen-viewing skill, omission of privacy and storage warnings can lead to unsafe handling, persistence of confidential data on disk, and unintentional disclosure. The context makes this particularly risky because the primary function directly accesses one of the most sensitive data surfaces on a user device.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This demo captures the user's screen and writes the image to disk without a clear privacy warning, consent step, or guidance about sensitive content that may be recorded. In a screen-viewing skill, that context increases risk because screenshots can contain credentials, messages, tokens, or regulated data, and examples strongly influence downstream use.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The region screenshot demo persists captured screen content to example_region.png without disclosing that even partial screen captures may include secrets or personal data. Because this skill is specifically designed for screenshot capture and analysis, omission of privacy messaging makes accidental collection and retention more likely.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal