docx-pdf-knowledge-parser

Security checks across malware telemetry and agentic risk

Overview

This is a transparent local document parser that creates reviewable output files, with dependency and data-handling cautions but no evidence of hidden exfiltration or destructive behavior.

Install only if you are comfortable with local summaries, filenames, paths, and parse-error details being written to disk. Run it on a narrow folder of documents you are authorized to process, review or delete generated outputs after use, and pin updated safe versions of python-docx and pypdf before using it on untrusted files.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (5)

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The code writes extracted document summaries and raw parse error details into report files, which can disclose sensitive document contents or filesystem/parser internals to anyone with access to the output directory. In an ingestion skill that may process internal policy or operational documents, even short summaries and exception messages can leak confidential information beyond the original input boundary.

Unpinned Dependencies

Low
Category
Supply Chain
Content
python-docx
pypdf
Confidence
97% confidence
Finding
python-docx

Unpinned Dependencies

Low
Category
Supply Chain
Content
python-docx
pypdf
Confidence
98% confidence
Finding
pypdf

Known Vulnerable Dependency: python-docx — 2 advisory(ies): CVE-2016-5851 (Improper Restriction of XML External Entity Reference in python-docx); CVE-2016-5851 (python-docx before 0.8.6 allows context-dependent attackers to conduct XML Exter)

High
Category
Supply Chain
Confidence
95% confidence
Finding
python-docx

Known Vulnerable Dependency: pypdf — 10 advisory(ies): CVE-2026-24688 (pypdf has possible Infinite Loop when processing outlines/bookmarks); CVE-2026-27628 (pypdf has a possible infinite loop when loading circular /Prev entries in cross-); CVE-2026-40260 (pypdf: Manipulated XMP metadata entity declarations can exhaust RAM) +7 more

Low
Category
Supply Chain
Confidence
91% confidence
Finding
pypdf

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal