PDF和图片文字提取

Security checks across malware telemetry and agentic risk

Overview

This PDF/image text extractor mostly matches its purpose, but it quietly contacts a third-party usage endpoint and gives incomplete privacy/storage disclosure.

Review before installing. Use it only if you are comfortable with a PDF extraction script contacting redfox.hk on each run for usage recording, and avoid processing sensitive documents unless that telemetry is removed or blocked. When saving results, treat the generated Markdown as a persistent copy of the document text and delete it when no longer needed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill claims files are only used in the current session and not persistently stored, yet it also documents saving extracted text to a local .md file and includes external usage logging in the architecture. Contradictory data-handling statements are dangerous because they mislead users about retention and disclosure of potentially sensitive document contents or metadata.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: External HTTP usage logging is unrelated to the core OCR/text-extraction purpose and introduces an unnecessary data egress path. In a skill handling user-supplied documents, even metadata such as timestamps, filenames, paths, page counts, or error details can be sensitive, so extra network communication increases privacy and compliance risk.

Description-Behavior Mismatch

Medium

Confidence: 98% confidence
Finding: The skill claims to extract text from PDFs/images, but the script also performs an undeclared outbound POST to a third-party domain whenever it runs. Silent telemetry is dangerous because it creates hidden network behavior, can leak usage metadata without user consent, and expands the trust boundary beyond local document processing.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: In an OCR/PDF extraction tool, users reasonably expect local file processing, not background network communication. The undisclosed HTTP request to an external service enables covert telemetry and may expose operational metadata such as tool usage frequency, environment egress patterns, or user activity timing.

Intent-Code Divergence

Low

Confidence: 93% confidence
Finding: The module documentation presents the script as a PDF text extractor, but omits that execution triggers remote usage recording. This mismatch is security-relevant because it misleads reviewers and users about the script's true behavior, undermining informed consent and making hidden data flows harder to detect.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The README says users can 'simply describe your need in natural language' and provides broad example phrases, but it does not define clear activation boundaries or constraints. In an agent setting, this can cause over-triggering on loosely related requests and increase the chance the skill processes sensitive attachments or documents when the user did not explicitly intend OCR/text extraction.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README advertises that extracted results can be saved as a Markdown file, but it does not warn that document contents may be written to persistent storage. Because OCR/PDF extraction often handles sensitive documents, silent or poorly disclosed file creation can expose private data, create residual artifacts on disk, and violate user expectations about where extracted content is stored.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The workflow example 'Extract the text from this PDF and save it' normalizes a save action without warning that potentially sensitive document contents will be persisted to disk. In practice, users may focus on extraction and overlook the storage side effect, leading to unintended retention of confidential material in Markdown files.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill describes telemetry/API usage without a clear user-facing warning or consent flow for data transmission. Lack of notice is dangerous because users may reasonably believe document processing is local or in-session only, while the skill can contact an external service during use.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The script silently sends an outbound usage-recording request and suppresses all exceptions, making the behavior both undisclosed and difficult to notice during testing. Hidden, fail-closed telemetry is risky because it can bypass user expectations, complicate auditing, and normalize covert exfiltration patterns in otherwise local-processing tools.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal