Mistral PDF OCR

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed Mistral OCR helper that sends chosen documents to Mistral and writes OCR outputs locally, with no evidence of hidden or destructive behavior.

Install only if you are comfortable sending the selected PDFs or images to Mistral for OCR. Avoid confidential or regulated documents unless that transfer is approved, protect MISTRAL_API_KEY, choose an output folder you can secure or clean up, and use --cleanup-upload when you want the script to attempt removal of uploaded files after processing.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (6)

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The reference explicitly instructs users to upload local PDFs and submit public document URLs to Mistral's external OCR API, but it does not warn that full document contents leave the local environment and are transmitted to a third party. In a document-processing skill, this omission is security-relevant because users may unknowingly send sensitive PDFs, scans, or regulated data to an external service.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation explicitly states that OCR outputs, extracted images/tables, and the full raw OCR response are written to disk, but it provides no warning about persistence of potentially sensitive document contents. In an OCR skill, inputs commonly include IDs, contracts, medical records, invoices, or other confidential files, so unguarded storage materially increases the risk of unintended retention, later disclosure, and mishandling.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The script writes full OCR output, per-page markdown, extracted images, tables, and optional document annotations to disk by default. If users process sensitive documents, this can leave confidential data in local storage, backups, shared workspaces, or logs without any explicit consent, minimization, or retention controls.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill uploads local files or sends document URLs to a third-party OCR service without an explicit warning that document contents leave the local environment. In a document-processing skill, users may provide highly sensitive PDFs, so silent remote transmission increases privacy, compliance, and data-handling risk.

External Transmission

Medium

Category: Data Exfiltration
Content: ## Endpoints - **OCR**: `POST https://api.mistral.ai/v1/ocr` - **Files upload**: `POST https://api.mistral.ai/v1/files` with `purpose="ocr"` ## Default model
Confidence: 83% confidence
Finding: https://api.mistral.ai/

External Transmission

Medium

Category: Data Exfiltration
Content: ## Endpoints - **OCR**: `POST https://api.mistral.ai/v1/ocr` - **Files upload**: `POST https://api.mistral.ai/v1/files` with `purpose="ocr"` ## Default model
Confidence: 83% confidence
Finding: https://api.mistral.ai/

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal