Back to skill

Security audit

Private Document AI with OpenVINO

Security checks across malware telemetry and agentic risk

Overview

The skill mostly behaves like a local document processor, but its generated Jupyter notebook can download and run remote model code in a way that is not clearly disclosed.

Install only in a virtual environment, process only documents you intentionally select, and write outputs to a private folder. Treat generated notebooks as untrusted drafts: inspect every cell, remove trust_remote_code=True unless you explicitly trust the model source, and do not run model-download cells in sensitive environments without approval.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Findings (13)

Context-Inappropriate Capability

Medium
Confidence
89% confidence
Finding
The script is presented as a local parser, but `get_paddleocr_vl_paths()` treats `PADDLEOCR_VL_ALLOW_AUTO_DOWNLOAD=1` as sufficient to proceed even when local model paths are absent. In a sensitive or offline environment, this can trigger unexpected network/model-fetch behavior through the downstream library, violating deployment assumptions and increasing supply-chain and data-exposure risk.

Intent-Code Divergence

Medium
Confidence
84% confidence
Finding
The top-level documentation says the parser runs locally, but the implementation can permit model auto-download via environment configuration. This mismatch is security-relevant because operators may trust the tool in restricted environments and unknowingly allow network access or unreviewed model acquisition.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The script advertises a document-to-code transformer, but the jupyter-notebook path generates a substantially different artifact: an OpenVINO LLM demo notebook. This is a capability mismatch that can mislead downstream users and automation into producing and later executing code with model-loading and demo behavior unrelated to the input document.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
This code path injects unjustified functionality into generated notebooks, including environment-driven configuration, remote model acquisition, and execution-oriented ML pipeline setup. In a document transformation skill, those extra capabilities expand the attack surface and create a realistic path for unexpected code execution, supply-chain exposure, and sensitive-environment interaction when users run the generated notebook.

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
The notebook is framed as a simple generated artifact from document processing, but the produced cells implement a complete OpenVINO language-model workflow and demo scaffold. That discrepancy increases the chance a reviewer will trust and execute the notebook without recognizing that it performs much more than document representation.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The skill enables implicit invocation while advertising broad document-processing capabilities in a natural-language default prompt, which increases the chance the agent will trigger this skill unintentionally during unrelated conversations about PDFs, images, invoices, code, or notebooks. Because the skill appears to operate on local documents and can transform them into structured data or code outputs, accidental invocation could expose sensitive local content or cause unintended processing without clear user intent.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The code exports document table contents directly to CSV files, and this module is explicitly designed to process invoices, medical records, contact details, and other sensitive business/personal data. Writing those contents to filesystem artifacts without minimization, access controls, retention policy, or explicit privacy gating creates a real confidentiality risk if output directories are shared, synced, or insufficiently protected.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The traceability records emit raw extracted entity values, which may include names, emails, phone numbers, tax IDs, invoice identifiers, and medical/payment details. This duplicates sensitive data into an additional artifact, expanding the attack surface and increasing the chance of unintended disclosure through logs, exports, or downstream consumers.

Missing User Warnings

Medium
Confidence
99% confidence
Finding
The generated notebook downloads models from remote sources and enables trust_remote_code=True, which can execute model-provided custom code during loading. In a generated artifact that users may run with high trust, this creates a supply-chain and arbitrary code execution risk, especially in developer environments with credentials, network access, or local secrets.

Known Vulnerable Dependency: PyMuPDF — 1 advisory(ies): CVE-2026-3029 (PyMuPDF has a path traversal in _main_.py)

Low
Category
Supply Chain
Confidence
74% confidence
Finding
PyMuPDF

Known Vulnerable Dependency: pypdf — 10 advisory(ies): CVE-2026-24688 (pypdf has possible Infinite Loop when processing outlines/bookmarks); CVE-2026-27628 (pypdf has a possible infinite loop when loading circular /Prev entries in cross-); CVE-2026-40260 (pypdf: Manipulated XMP metadata entity declarations can exhaust RAM) +7 more

Low
Category
Supply Chain
Confidence
81% confidence
Finding
pypdf

Known Vulnerable Dependency: Pillow — 10 advisory(ies): CVE-2016-2533 (Pillow buffer overflow in ImagingPcdDecode); CVE-2023-50447 (Arbitrary Code Execution in Pillow); CVE-2021-27922 (Pillow Uncontrolled Resource Consumption) +7 more

Critical
Category
Supply Chain
Confidence
86% confidence
Finding
Pillow

Known Vulnerable Dependency: opencv-python — 10 advisory(ies): CVE-2017-12864 (Integer Overflow or Wraparound in OpenCV); CVE-2017-12598 (Out-of-bounds Read in OpenCV ); CVE-2019-14493 (NULL Pointer Dereference in OpenCV.) +7 more

High
Category
Supply Chain
Confidence
83% confidence
Finding
opencv-python

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal