Unstructured Medical Text Miner

Security checks across malware telemetry and agentic risk

Overview

This medical text-mining skill is coherent, but it can save sensitive patient-level clinical extracts to disk without enough privacy safeguards.

Install only if you are authorized to process the clinical text involved. Prefer de-identified or synthetic data, run the tool in a controlled local environment, pass an explicit secure output path, restrict access to generated JSON files, and pin or lock dependencies before using it on real datasets.

SkillSpector

By NVIDIA

Vulnerability Patterns

Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (15)

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The CLI silently persists extracted patient insights to disk even though the skill is framed as a text-mining utility, which creates an unexpected data-retention path for potentially sensitive clinical information. In a healthcare-text context, writing derived patient data to a local JSON file by default can violate user expectations and increase the risk of accidental PHI exposure on shared systems, containers, or CI runners.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill is designed to process clinical notes, radiology reports, and discharge summaries, which commonly contain protected health information and highly sensitive medical data, but it lacks a prominent privacy warning, handling requirements, or de-identification guidance. In this context, omission is dangerous because users may feed raw patient text into the workflow and generate stored outputs that preserve sensitive content, increasing the likelihood of privacy breaches or regulatory noncompliance.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The export function writes full insight structures directly to JSON without any warning, consent step, or safeguards despite processing clinical notes and patient-linked identifiers. Because the resulting JSON can contain entities, timelines, diagnoses, note metadata, and subject/hospital admission IDs, this creates a realistic risk of sensitive medical data leakage if files are left on disk, copied, or logged.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: Automatically creating a default output file when the user did not request one is a security/privacy weakness because it persists potentially sensitive clinical insights without explicit intent. In this skill's context, the danger is elevated because the analyzed source material is medical text from MIMIC-style datasets, so silent writes can cause accidental retention and downstream disclosure of regulated health information.

Unpinned Dependencies

Low

Category: Supply Chain
Content: negspacy numpy pandas pyarrow
Confidence: 95% confidence
Finding: negspacy

Unpinned Dependencies

Low

Category: Supply Chain
Content: negspacy numpy pandas pyarrow pyyaml
Confidence: 98% confidence
Finding: numpy

Unpinned Dependencies

Low

Category: Supply Chain
Content: negspacy numpy pandas pyarrow pyyaml scispacy
Confidence: 96% confidence
Finding: pandas

Unpinned Dependencies

Low

Category: Supply Chain
Content: negspacy numpy pandas pyarrow pyyaml scispacy spacy
Confidence: 98% confidence
Finding: pyarrow

Unpinned Dependencies

Low

Category: Supply Chain
Content: numpy pandas pyarrow pyyaml scispacy spacy tqdm
Confidence: 99% confidence
Finding: pyyaml

Unpinned Dependencies

Low

Category: Supply Chain
Content: pandas pyarrow pyyaml scispacy spacy tqdm
Confidence: 93% confidence
Finding: scispacy

Unpinned Dependencies

Low

Category: Supply Chain
Content: pyarrow pyyaml scispacy spacy tqdm
Confidence: 93% confidence
Finding: spacy

Unpinned Dependencies

Low

Category: Supply Chain
Content: pyyaml scispacy spacy tqdm
Confidence: 97% confidence
Finding: tqdm

Known Vulnerable Dependency: numpy — 10 advisory(ies): CVE-2014-1859 (Numpy arbitrary file write via symlink attack); CVE-2021-41495 (NumPy NULL Pointer Dereference); CVE-2021-33430 (NumPy Buffer Overflow (Disputed)) +7 more

Critical

Category: Supply Chain
Confidence: 85% confidence
Finding: numpy

Known Vulnerable Dependency: pyarrow — 8 advisory(ies): CVE-2023-47248 (PyArrow: Arbitrary code execution when loading a malicious data file); CVE-2019-12408 (Missing Initialization of Resource in Apache Arrow); CVE-2019-12410 (Missing Initialization of Resource in Apache Arrow) +5 more

Critical

Category: Supply Chain
Confidence: 92% confidence
Finding: pyarrow

Known Vulnerable Dependency: pyyaml — 8 advisory(ies): CVE-2019-20477 (Deserialization of Untrusted Data in PyYAML); CVE-2020-1747 (Improper Input Validation in PyYAML); CVE-2020-14343 (Improper Input Validation in PyYAML) +5 more

Critical

Category: Supply Chain
Confidence: 96% confidence
Finding: pyyaml

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal