Semantic Consistency Auditor

Security checks across malware telemetry and agentic risk

Overview

The skill appears to be a local clinical-note semantic scoring tool, but it is packaged with misleading academic-writing framing and weak disclosure around sensitive medical text, model downloads, and dependency risk.

Review before installing. Use only with de-identified clinical or similarly sensitive text, run it in an isolated environment, fix the syntax error, replace and pin dependencies, verify model sources, and treat any JSON or console output as sensitive because it can include the original AI-generated and gold-standard text.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (17)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation advertises executable paths and file-based JSON input/output, which implies file read/write capability without any declared permissions or trust boundary. This is dangerous because agents or reviewers may treat the skill as lower risk than it is, leading to unintended access to local files and output locations.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 98% confidence
Finding: The declared purpose says this is an academic-writing workflow aid, but the body describes a medical semantic evaluation tool that downloads external models and processes clinical-note content. This mismatch is dangerous because it can mislead users and automated policy systems into approving a skill for benign academic use when it actually handles sensitive medical text and network-enabled model execution.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The manifest description frames the skill as an academic-writing auditor, while the content clearly targets clinical-note semantic evaluation. Mislabeling capability and domain is dangerous because it defeats informed consent, misroutes the skill into inappropriate contexts, and increases the chance that sensitive healthcare data will be processed under weaker controls.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The 'When to Use' guidance reinforces an academic-writing workflow framing, but the rest of the document is for clinical semantic scoring. This is dangerous because operators may invoke the skill on the wrong data types or approve it for environments that are not authorized for medical-data handling or external model downloads.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: The skill performs runtime model download/load behavior that is not apparent from the stated auditing purpose, which expands the trust boundary and introduces unreviewed network and supply-chain exposure. In a workflow handling sensitive medical text, hidden dependency retrieval can cause privacy, availability, and integrity risks if external resources are compromised or blocked.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Runtime downloading of the COMET model is a real supply-chain and operational risk because it pulls executable model artifacts from external infrastructure during normal use. For a tool processing medical content, this is especially concerning when the manifest does not clearly justify or disclose the need for live downloads in potentially regulated environments.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill encourages evaluation of clinical notes and writing detailed JSON outputs but does not warn against inclusion of real patient identifiers or other sensitive health data. This is dangerous because users may process regulated medical information and persist it to disk without de-identification, retention limits, or access controls.

Missing User Warnings

Low

Confidence: 76% confidence
Finding: The installation and performance notes imply downloading large external models, but the documentation does not clearly disclose the network dependency and associated privacy or supply-chain considerations. This is risky because users may run the skill in restricted or sensitive environments without understanding that external services and third-party model artifacts are involved.

Unpinned Dependencies

Low

Category: Supply Chain
Content: bert_score comet dataclasses numpy
Confidence: 93% confidence
Finding: bert_score

Unpinned Dependencies

Low

Category: Supply Chain
Content: bert_score comet dataclasses numpy torch
Confidence: 93% confidence
Finding: comet

Unpinned Dependencies

Low

Category: Supply Chain
Content: bert_score comet dataclasses numpy torch yaml
Confidence: 84% confidence
Finding: dataclasses

Unpinned Dependencies

Low

Category: Supply Chain
Content: bert_score comet dataclasses numpy torch yaml
Confidence: 97% confidence
Finding: numpy

Unpinned Dependencies

Low

Category: Supply Chain
Content: comet dataclasses numpy torch yaml
Confidence: 98% confidence
Finding: torch

Unpinned Dependencies

Low

Category: Supply Chain
Content: dataclasses numpy torch yaml
Confidence: 99% confidence
Finding: yaml

Known Vulnerable Dependency: numpy — 10 advisory(ies): CVE-2014-1859 (Numpy arbitrary file write via symlink attack); CVE-2021-41495 (NumPy NULL Pointer Dereference); CVE-2021-33430 (NumPy Buffer Overflow (Disputed)) +7 more

Critical

Category: Supply Chain
Confidence: 91% confidence
Finding: numpy

Known Vulnerable Dependency: torch — 10 advisory(ies): CVE-2025-2953 (PyTorch susceptible to local Denial of Service); CVE-2022-45907 (PyTorch vulnerable to arbitrary code execution); CVE-2025-32434 (PyTorch: `torch.load` with `weights_only=True` leads to remote code execution) +7 more

Critical

Category: Supply Chain
Confidence: 96% confidence
Finding: torch

Possible Typosquatting: 'yaml' resembles popular package 'pyyaml'

High

Category: Supply Chain
Confidence: 99% confidence
Finding: yaml

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal