Unstructured Medical Text Miner

Security checks across malware telemetry and agentic risk

Overview

This medical text-mining skill is coherent, but it can save sensitive patient-level clinical extracts to disk without enough privacy safeguards.

Install only if you are authorized to process the clinical text involved. Prefer de-identified or synthetic data, run the tool in a controlled local environment, pass an explicit secure output path, restrict access to generated JSON files, and pin or lock dependencies before using it on real datasets.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (15)

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The CLI silently persists extracted patient insights to disk even though the skill is framed as a text-mining utility, which creates an unexpected data-retention path for potentially sensitive clinical information. In a healthcare-text context, writing derived patient data to a local JSON file by default can violate user expectations and increase the risk of accidental PHI exposure on shared systems, containers, or CI runners.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill is designed to process clinical notes, radiology reports, and discharge summaries, which commonly contain protected health information and highly sensitive medical data, but it lacks a prominent privacy warning, handling requirements, or de-identification guidance. In this context, omission is dangerous because users may feed raw patient text into the workflow and generate stored outputs that preserve sensitive content, increasing the likelihood of privacy breaches or regulatory noncompliance.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The export function writes full insight structures directly to JSON without any warning, consent step, or safeguards despite processing clinical notes and patient-linked identifiers. Because the resulting JSON can contain entities, timelines, diagnoses, note metadata, and subject/hospital admission IDs, this creates a realistic risk of sensitive medical data leakage if files are left on disk, copied, or logged.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
Automatically creating a default output file when the user did not request one is a security/privacy weakness because it persists potentially sensitive clinical insights without explicit intent. In this skill's context, the danger is elevated because the analyzed source material is medical text from MIMIC-style datasets, so silent writes can cause accidental retention and downstream disclosure of regulated health information.

Unpinned Dependencies

Low
Category
Supply Chain
Content
negspacy
numpy
pandas
pyarrow
Confidence
95% confidence
Finding
negspacy

Unpinned Dependencies

Low
Category
Supply Chain
Content
negspacy
numpy
pandas
pyarrow
pyyaml
Confidence
98% confidence
Finding
numpy

Unpinned Dependencies

Low
Category
Supply Chain
Content
negspacy
numpy
pandas
pyarrow
pyyaml
scispacy
Confidence
96% confidence
Finding
pandas

Unpinned Dependencies

Low
Category
Supply Chain
Content
negspacy
numpy
pandas
pyarrow
pyyaml
scispacy
spacy
Confidence
98% confidence
Finding
pyarrow

Unpinned Dependencies

Low
Category
Supply Chain
Content
numpy
pandas
pyarrow
pyyaml
scispacy
spacy
tqdm
Confidence
99% confidence
Finding
pyyaml

Unpinned Dependencies

Low
Category
Supply Chain
Content
pandas
pyarrow
pyyaml
scispacy
spacy
tqdm
Confidence
93% confidence
Finding
scispacy

Unpinned Dependencies

Low
Category
Supply Chain
Content
pyarrow
pyyaml
scispacy
spacy
tqdm
Confidence
93% confidence
Finding
spacy

Unpinned Dependencies

Low
Category
Supply Chain
Content
pyyaml
scispacy
spacy
tqdm
Confidence
97% confidence
Finding
tqdm

Known Vulnerable Dependency: numpy — 10 advisory(ies): CVE-2014-1859 (Numpy arbitrary file write via symlink attack); CVE-2021-41495 (NumPy NULL Pointer Dereference); CVE-2021-33430 (NumPy Buffer Overflow (Disputed)) +7 more

Critical
Category
Supply Chain
Confidence
85% confidence
Finding
numpy

Known Vulnerable Dependency: pyarrow — 8 advisory(ies): CVE-2023-47248 (PyArrow: Arbitrary code execution when loading a malicious data file); CVE-2019-12408 (Missing Initialization of Resource in Apache Arrow); CVE-2019-12410 (Missing Initialization of Resource in Apache Arrow) +5 more

Critical
Category
Supply Chain
Confidence
92% confidence
Finding
pyarrow

Known Vulnerable Dependency: pyyaml — 8 advisory(ies): CVE-2019-20477 (Deserialization of Untrusted Data in PyYAML); CVE-2020-1747 (Improper Input Validation in PyYAML); CVE-2020-14343 (Improper Input Validation in PyYAML) +5 more

Critical
Category
Supply Chain
Confidence
96% confidence
Finding
pyyaml

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal