Mistral PDF OCR

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed Mistral OCR helper that sends chosen documents to Mistral and writes OCR outputs locally, with no evidence of hidden or destructive behavior.

Install only if you are comfortable sending the selected PDFs or images to Mistral for OCR. Avoid confidential or regulated documents unless that transfer is approved, protect MISTRAL_API_KEY, choose an output folder you can secure or clean up, and use --cleanup-upload when you want the script to attempt removal of uploaded files after processing.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (6)

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The reference explicitly instructs users to upload local PDFs and submit public document URLs to Mistral's external OCR API, but it does not warn that full document contents leave the local environment and are transmitted to a third party. In a document-processing skill, this omission is security-relevant because users may unknowingly send sensitive PDFs, scans, or regulated data to an external service.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documentation explicitly states that OCR outputs, extracted images/tables, and the full raw OCR response are written to disk, but it provides no warning about persistence of potentially sensitive document contents. In an OCR skill, inputs commonly include IDs, contracts, medical records, invoices, or other confidential files, so unguarded storage materially increases the risk of unintended retention, later disclosure, and mishandling.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The script writes full OCR output, per-page markdown, extracted images, tables, and optional document annotations to disk by default. If users process sensitive documents, this can leave confidential data in local storage, backups, shared workspaces, or logs without any explicit consent, minimization, or retention controls.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill uploads local files or sends document URLs to a third-party OCR service without an explicit warning that document contents leave the local environment. In a document-processing skill, users may provide highly sensitive PDFs, so silent remote transmission increases privacy, compliance, and data-handling risk.

External Transmission

Medium
Category
Data Exfiltration
Content
## Endpoints

- **OCR**: `POST https://api.mistral.ai/v1/ocr`
- **Files upload**: `POST https://api.mistral.ai/v1/files` with `purpose="ocr"`

## Default model
Confidence
83% confidence
Finding
https://api.mistral.ai/

External Transmission

Medium
Category
Data Exfiltration
Content
## Endpoints

- **OCR**: `POST https://api.mistral.ai/v1/ocr`
- **Files upload**: `POST https://api.mistral.ai/v1/files` with `purpose="ocr"`

## Default model
Confidence
83% confidence
Finding
https://api.mistral.ai/

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal