Markitdown File Converter

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real document converter, but it needs review because it can install software automatically and send document images to a preconfigured cloud OCR service.

Install only if you are comfortable with automatic dependency changes and possible remote OCR uploads. Use an isolated environment, avoid confidential documents unless cloud OCR is disabled or removed, replace or revoke the embedded PaddleOCR token, and prefer manual, pinned dependency installation over runtime installs.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (12)

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The document claims the tool can run without network download, yet elsewhere it describes cloud OCR usage and includes a prefilled access token. This inconsistency can mislead operators into sending document contents to an external service and exposes embedded credentials that could be abused for unauthorized API use.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
The skill states that reported errors will be automatically recorded into a learning system, which is unrelated to basic file conversion and introduces undeclared data retention. User corrections may contain sensitive document excerpts, so silent collection creates a privacy and compliance risk.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The code installs the `pdf2image` package at runtime when it is missing. Runtime dependency installation is risky because it changes the execution environment during normal processing, can pull unpinned code from external package indexes, and may execute package installation hooks with the privileges of the running process. In a document-conversion backend, this expands trust to the network and package supply chain in a way that is not necessary for safe file handling.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The backend for local document conversion automatically installs pandoc and invokes system package managers, which expands its privileges and side effects far beyond simple file conversion. In an agent/skill context, this can unexpectedly modify the host system, trigger privileged operations, or be abused in environments where tool execution should remain constrained.

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The code downloads a pandoc ZIP from GitHub at runtime and extracts an executable without any integrity verification such as a checksum or signature check. This introduces a software supply-chain risk: if the download source, network path, or release asset is compromised, the skill may place and run a malicious binary on the host.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The skill promotes automatic dependency installation and cloud OCR but does not present a clear upfront risk warning about package installation, privilege requirements, network use, or possible data exfiltration to third parties. Users may unknowingly authorize system changes or external processing of sensitive documents.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill performs package installation and system command execution automatically when pandoc is missing, with no confirmation prompt or warning to the user. In an agent setting this is risky because processing an untrusted document can unexpectedly trigger host changes, package installs, or privileged package-manager activity.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
When PaddleOCR Cloud is configured, the code sends image content to an external cloud OCR service automatically, but there is no clear user-facing notice or consent gate in this path outside verbose logging. This creates a privacy and data-governance risk because documents and images may contain sensitive information, and users may reasonably expect OCR to occur locally.

Missing User Warnings

Low
Confidence
82% confidence
Finding
The PDF OCR path writes rendered page images to disk before processing them, but this code does not enforce secure temporary-file handling, cleanup, or user disclosure. If the pages contain sensitive data, those intermediate PNGs may persist longer than expected and be exposed to other local users, backups, or forensic recovery.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The function uploads the provided image file to a third-party OCR endpoint and only logs that behavior when verbose mode is enabled. This creates a real privacy and data-handling risk because users may unknowingly transmit sensitive documents, IDs, or screenshots off-host without any explicit consent flow or always-on disclosure.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The helper installs arbitrary Python packages by invoking pip directly with no user confirmation, disclosure, allowlist, or integrity controls. If an attacker can influence the package name or trigger this path unexpectedly, the system may download and execute untrusted code during installation, which is especially risky because pip package installation commonly runs setup/build hooks.

Ssd 3

Medium
Confidence
93% confidence
Finding
Automatic persistence of user-reported errors to a learning system creates unbounded data collection beyond the stated conversion function. Because corrections often include snippets of source documents, this can store sensitive data without informed consent or data minimization.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal