PROMPT INJECTION PROTECTION

Security checks across malware telemetry and agentic risk

Overview

This appears to be a defensive prompt-injection skill, but it starts background update behavior and keeps persistent learned state with weak disclosure and some misleading safety claims.

Install only after reviewing the automatic learning and update behavior. Use strictMode for enforcement decisions, disable learning and auto-updates unless you explicitly need them, do not rely on fetchAndSecureContent as real web sanitization, and avoid logging raw prompts, documents, commands, paths, or matched threat text from alert callbacks.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The skill is presented as prompt-injection protection, but it also instantiates and starts an autonomous auto-update/hardening subsystem at construction time. In a security-sensitive agent context, hidden side effects that can change system behavior or fetch/apply updates expand the trust boundary and create a path for unintended network access, configuration drift, or remote code/logic changes unrelated to input sanitization.

Context-Inappropriate Capability

Medium

Confidence: 83% confidence
Finding: The adaptive learning interface allows externally supplied content to influence future detection behavior via learning methods and analyze-and-learn flows. In a prompt-injection defense component, this can let attackers poison the model or rules over time, degrading detection quality, increasing false negatives, or causing policy drift from adversarial samples.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The function and surrounding comments claim to securely fetch and process web content, but the implementation only logs a message and returns a hardcoded success object without fetching, validating, or sanitizing any content. This creates a dangerous security mismatch: downstream code may trust the `secured: true` result and skip real protections, leading to unsafe handling of untrusted web data once integrated or assumed to be protected.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The document advertises autonomous learning, persistent knowledge storage, and automatic threat-intelligence updates, but provides no safeguards, consent model, retention limits, provenance controls, or warning about external data ingestion risks. In a security skill, these capabilities can expand trust boundaries, cause sensitive data to be retained across sessions, and allow poisoning or unsafe adaptation if learned from adversarial or low-trust inputs.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The file claims dynamic permission adjustment, intelligent file monitoring, and automated incident response without stating limits, approval requirements, or failure modes. In the context of an agent skill, this is more dangerous because it implies the system may autonomously alter file access or system behavior, which could disrupt user workflows, damage data availability, or be abused if detection logic is wrong or manipulated.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The security alert callback prints raw command text, file paths, content previews, and matched threat substrings directly to the console. If scanned inputs contain secrets, proprietary prompts, tokens, user data, or sensitive filesystem details, this logging can create a secondary data exposure channel through terminal history, CI logs, or centralized log aggregation.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The code automatically learns from analyzed content and persists derived threat data to disk in learned-threats.json without any notice, consent, retention control, or data minimization. In a security skill, analyzed inputs may contain sensitive prompts, secrets, internal instructions, or user data, so silently storing patterns generated from that content creates a privacy and data-governance risk and may unintentionally retain attacker-supplied payloads or sensitive fragments.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: Starting the auto-update system by default, without visible consent or warning in this file, introduces unsolicited behavior in a library/component that users may expect to be passive. In agent environments, automatic background update activity can trigger network access and state changes that users did not authorize, increasing operational and supply-chain risk.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal