Security audit

Private Document AI with OpenVINO

Security checks across malware telemetry and agentic risk

Overview

The skill mostly behaves like a local document processor, but its generated Jupyter notebook can download and run remote model code in a way that is not clearly disclosed.

Install only in a virtual environment, process only documents you intentionally select, and write outputs to a private folder. Treat generated notebooks as untrusted drafts: inspect every cell, remove trust_remote_code=True unless you explicitly trust the model source, and do not run model-download cells in sensitive environments without approval.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger

Findings (13)

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The script is presented as a local parser, but `get_paddleocr_vl_paths()` treats `PADDLEOCR_VL_ALLOW_AUTO_DOWNLOAD=1` as sufficient to proceed even when local model paths are absent. In a sensitive or offline environment, this can trigger unexpected network/model-fetch behavior through the downstream library, violating deployment assumptions and increasing supply-chain and data-exposure risk.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The top-level documentation says the parser runs locally, but the implementation can permit model auto-download via environment configuration. This mismatch is security-relevant because operators may trust the tool in restricted environments and unknowingly allow network access or unreviewed model acquisition.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The script advertises a document-to-code transformer, but the jupyter-notebook path generates a substantially different artifact: an OpenVINO LLM demo notebook. This is a capability mismatch that can mislead downstream users and automation into producing and later executing code with model-loading and demo behavior unrelated to the input document.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: This code path injects unjustified functionality into generated notebooks, including environment-driven configuration, remote model acquisition, and execution-oriented ML pipeline setup. In a document transformation skill, those extra capabilities expand the attack surface and create a realistic path for unexpected code execution, supply-chain exposure, and sensitive-environment interaction when users run the generated notebook.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The notebook is framed as a simple generated artifact from document processing, but the produced cells implement a complete OpenVINO language-model workflow and demo scaffold. That discrepancy increases the chance a reviewer will trust and execute the notebook without recognizing that it performs much more than document representation.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The skill enables implicit invocation while advertising broad document-processing capabilities in a natural-language default prompt, which increases the chance the agent will trigger this skill unintentionally during unrelated conversations about PDFs, images, invoices, code, or notebooks. Because the skill appears to operate on local documents and can transform them into structured data or code outputs, accidental invocation could expose sensitive local content or cause unintended processing without clear user intent.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code exports document table contents directly to CSV files, and this module is explicitly designed to process invoices, medical records, contact details, and other sensitive business/personal data. Writing those contents to filesystem artifacts without minimization, access controls, retention policy, or explicit privacy gating creates a real confidentiality risk if output directories are shared, synced, or insufficiently protected.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The traceability records emit raw extracted entity values, which may include names, emails, phone numbers, tax IDs, invoice identifiers, and medical/payment details. This duplicates sensitive data into an additional artifact, expanding the attack surface and increasing the chance of unintended disclosure through logs, exports, or downstream consumers.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The generated notebook downloads models from remote sources and enables trust_remote_code=True, which can execute model-provided custom code during loading. In a generated artifact that users may run with high trust, this creates a supply-chain and arbitrary code execution risk, especially in developer environments with credentials, network access, or local secrets.

Known Vulnerable Dependency: PyMuPDF — 1 advisory(ies): CVE-2026-3029 (PyMuPDF has a path traversal in _main_.py)

Low

Category: Supply Chain
Confidence: 74% confidence
Finding: PyMuPDF

Known Vulnerable Dependency: pypdf — 10 advisory(ies): CVE-2026-24688 (pypdf has possible Infinite Loop when processing outlines/bookmarks); CVE-2026-27628 (pypdf has a possible infinite loop when loading circular /Prev entries in cross-); CVE-2026-40260 (pypdf: Manipulated XMP metadata entity declarations can exhaust RAM) +7 more

Low

Category: Supply Chain
Confidence: 81% confidence
Finding: pypdf

Known Vulnerable Dependency: Pillow — 10 advisory(ies): CVE-2016-2533 (Pillow buffer overflow in ImagingPcdDecode); CVE-2023-50447 (Arbitrary Code Execution in Pillow); CVE-2021-27922 (Pillow Uncontrolled Resource Consumption) +7 more

Critical

Category: Supply Chain
Confidence: 86% confidence
Finding: Pillow

Known Vulnerable Dependency: opencv-python — 10 advisory(ies): CVE-2017-12864 (Integer Overflow or Wraparound in OpenCV); CVE-2017-12598 (Out-of-bounds Read in OpenCV ); CVE-2019-14493 (NULL Pointer Dereference in OpenCV.) +7 more

High

Category: Supply Chain
Confidence: 83% confidence
Finding: opencv-python

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal