vision ocr

Security checks across malware telemetry and agentic risk

Overview

This is a coherent OCR skill, but it needs Review because it can send document contents to external services and can forward auth/cookie headers when remote attachment input is enabled.

Install only if you are comfortable sending images/PDFs and extracted text to your configured OCR and multimodal providers. Keep Feishu auto-send and remote input disabled unless needed, avoid providing attachment cookies or authorization headers except for trusted download hosts, and consider narrowing the triggers or requiring confirmation in environments with sensitive chat attachments.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (11)

Lp3

Medium
Category
MCP Least Privilege
Confidence
90% confidence
Finding
The skill documentation describes use of environment variables and networked OCR/multimodal services, but no explicit permission declaration is shown. That creates a transparency and policy-enforcement gap: operators may invoke a skill that can read runtime context and perform outbound requests without an explicit permission review. In this context, the risk is real because the skill can access session-related environment data and transmit document content to remote services.

Tp4

High
Category
MCP Tool Poisoning
Confidence
96% confidence
Finding
The documented behavior exceeds the stated purpose in several sensitive ways: it can describe non-document images, download remote resources, infer recipients from runtime state, and persist VISION_* configuration into local config files. These extra capabilities materially expand the attack surface, enabling unintended data exfiltration, SSRF-style retrieval attempts, misdelivery to recovered chat identities, or secret persistence beyond what a user would expect from an OCR skill. The surrounding skill context makes this more dangerous because it handles potentially sensitive documents and integrates with Feishu messaging.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The skill can discover remote attachment URLs or token-derived download URLs from context and automatically fetch them for processing, but this behavior is not disclosed in the skill description. In a messaging/agent environment, hidden network-fetch behavior materially changes the trust boundary and may cause the agent to transmit request metadata or process attacker-controlled remote content without clear user consent.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
The skill can infer a delivery recipient from OpenClaw environment variables or a runtime.json file, which is unrelated to core OCR and broadens access to conversation context. In combination with auto-send behavior, this can route recognized content to a chat or user derived implicitly from runtime state rather than from an explicit user instruction, increasing the risk of unintended disclosure.

Vague Triggers

Medium
Confidence
80% confidence
Finding
Broad natural-language triggers like 'OCR 这个截图' can overlap with ordinary conversation and cause accidental activation. In a skill that can process attachments, call external services, and optionally send results to Feishu, an unintended trigger could expose document contents or initiate processing the user did not mean to authorize.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
Remote attachment download logic merges headers from context and may forward Authorization, Cookie, or Referer values to arbitrary remote URLs discovered in message content. Even with localhost/private-IP checks, this can leak bearer tokens or session cookies to attacker-controlled public hosts, enabling credential theft and downstream account compromise.

Vague Triggers

Medium
Confidence
96% confidence
Finding
The contains-match trigger uses the very generic term "识别", which is likely to appear in many unrelated conversations. This can cause unintended activation of the OCR skill, increasing the chance that user files or document content are processed or sent to external OCR/multimodal services without a sufficiently explicit request.

Vague Triggers

Medium
Confidence
95% confidence
Finding
A contains-match on "OCR" is overly broad because users may mention OCR in discussion, comparison, or troubleshooting contexts without intending to invoke the skill. In a skill that can process documents and optionally transmit extracted content externally, accidental invocation creates real privacy and data-handling risk.

Vague Triggers

Medium
Confidence
95% confidence
Finding
Matching on the single term "PDF" will overlap with many ordinary requests about viewing, editing, converting, or discussing PDFs. Because this skill is designed to OCR PDFs and may involve external services, broad activation can expose sensitive document content to unnecessary processing or transmission.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The manifest advertises optional automatic sending of OCR results to Feishu but does not clearly warn users that extracted document contents may be transmitted to a third-party messaging destination. Since OCR output can contain sensitive business, financial, or personal data, silent or poorly disclosed forwarding creates a meaningful confidentiality risk.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The configuration exposes remote OCR, multimodal endpoints, and remote-input options without clearly warning that uploaded images or PDFs may be sent to external services for processing. In the context of document OCR, this is especially sensitive because scans, receipts, invoices, and technical documents often contain confidential information, making undisclosed external transfer a serious privacy and compliance concern.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal