Prompt Guard

Security checks across malware telemetry and agentic risk

Overview

Prompt Guard is mostly a defensive scanner, but it defaults to outbound API and threat-intelligence traffic and includes host audit/fix tooling broader than prompt scanning.

Install only if you are comfortable configuring it explicitly. For sensitive or offline environments, set PG_API_ENABLED=false and disable HiveFence auto-reporting before first use, and consider turning off message-content logging. Treat python3 -m prompt_guard.audit --fix as an administrative host-modification command, not a normal prompt scanner.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (33)

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The architecture text presents contradictory security behavior around whether remote API features are disabled by default, while other sections state the API is enabled by default with a built-in key. In a security control, this can mislead operators into believing the skill is offline-only when it may actually make network requests, creating unintended data exposure and trust-boundary violations.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The README presents contradictory statements about whether API connectivity is enabled by default or optional/off by default. Security-relevant defaults must be unambiguous because users may unintentionally deploy the skill with outbound network access and a built-in key when they believed it was fully offline.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The release notes claim security scale and category counts that appear inconsistent with the provided skill metadata, which can mislead operators about actual detection coverage and protection depth. In a security skill, overstated or contradictory claims are dangerous because users may rely on nonexistent safeguards and relax other controls.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: Claiming the product works '100% offline' while other sections describe API-enabled-by-default behavior is misleading and can cause operators to deploy it in sensitive environments under false assumptions. In a security product, inaccurate offline/privacy claims are especially risky because protected prompts, outputs, or metadata may be sent over the network unexpectedly.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The quick start explicitly says API access is enabled by default with a built-in beta key, directly contradicting the earlier offline claim and indicating that user content may be processed by an external service by default. Default-on external transmission in a security scanning skill is dangerous because the scanned text may itself contain secrets, internal prompts, credentials, or regulated data.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The configuration comments describe API reporting as anonymous and hash-only, but the same file also enables HiveFence auto-reporting of HIGH+ detections without any equivalent anonymity constraint. This creates a misleading privacy posture and risks operators enabling external data sharing without understanding what incident data may leave the system.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file performs host security auditing and Clawdbot-specific configuration inspection, which is materially unrelated to the declared prompt-guard purpose of detecting prompt-injection and related AI security patterns. That scope mismatch is dangerous because it expands the skill’s authority and data access beyond user expectations, increasing the risk of unauthorized host reconnaissance and misuse in agent environments.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: Reading and evaluating /etc/ssh/sshd_config is unrelated to the manifest’s prompt-guard detection role and constitutes unnecessary system reconnaissance. In an agent setting, even read-only inspection of SSH hardening exposes sensitive operational details and normalizes privileged host access outside the advertised capability set.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The auto-remediation logic changes filesystem permissions via os.chmod based on accumulated findings, giving the skill write-side effects on the host that are not justified by its declared prompt-guard purpose. Unauthorized permission changes can disrupt applications, alter security posture, and become a vehicle for unsafe host modification when invoked by an agent or user who expected a detection-only tool.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The module documentation explicitly presents this file as a general security audit script for Clawdbot, which contradicts the package’s declared prompt-guard function. This mismatch is dangerous because deceptive or inaccurate capability descriptions reduce operator awareness and make overbroad host inspection appear acceptable within a narrowly scoped security skill.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The engine is presented as a local defensive scanner, but it enables an external API path by default and can transmit detections or analyzed content to third-party services. In a security product, silent outbound sharing is especially risky because users may submit secrets, prompts, or proprietary data expecting local-only inspection.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The comment claims the API client is off by default, but the implementation later enables it by default. This mismatch can mislead integrators and reviewers into deploying the library under a false privacy/security assumption, increasing the chance of unintended data egress.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: This file implements a host security auditing tool for Clawdbot rather than logic aligned with the skill’s advertised prompt-injection and DLP defense purpose. That scope mismatch is dangerous because users may install or trust the skill expecting prompt-guard behavior while it reads local configuration and system state unrelated to the declared function, increasing unnecessary access to sensitive environment data.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The script reads /etc/ssh/sshd_config to inspect SSH hardening, which is unrelated to a prompt-guard skill’s stated purpose. Even though it is read-only here, it expands the skill’s access into broader host reconnaissance and may expose sensitive system posture information in environments where the skill should not need any server-level visibility.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The auto-fix path performs filesystem modifications via os.chmod based on accumulated findings, which exceeds the expected behavior of a prompt-defense skill. Any write-capable behavior increases risk substantially because it can alter user files and permissions on the host, and users may invoke it under misplaced trust created by the skill’s benign-sounding security branding.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The module docstring states that this is a generic security audit script for Clawdbot, contradicting the package’s advertised prompt-guard function. Misrepresented purpose is dangerous because it undermines informed consent: users evaluating a prompt-defense skill would not expect host auditing and related local file inspection capabilities.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The code automatically reports HIGH/CRITICAL detections to a third-party HiveFence API, which goes beyond local prompt-analysis and creates an external data flow from user-supplied content. Even though it hashes the full message and truncates the digest, this is still message-derived telemetry sent off-box without explicit opt-in, which can violate privacy, leak behavioral metadata, and expand the attack surface.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The outbound POST to a third-party endpoint is automatic and not necessary for the detector's core local scanning function, so it introduces unjustified network egress in a security-sensitive component. This can expose sensitive operational metadata, create compliance issues, and make the skill dependent on an external service that could be unavailable or compromised.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The document says API connectivity is enabled by default and uses a built-in beta key, but it does not prominently warn users that the skill may initiate outbound connections unless they disable it. For a security-focused skill marketed as defensive and offline-capable, silent default egress undermines informed consent, deployment review, and network policy compliance.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README says the skill connects to an API by default with a built-in beta key, but the nearby example does not prominently warn that this causes outbound network access unless disabled. In agent-security tooling, undisclosed default egress can violate deployment assumptions, privacy requirements, or regulated-environment controls.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: Using an optional API by default with a built-in beta key, without a clear warning that input may be sent to a third party, creates a data exfiltration and privacy risk. Because this skill is designed to inspect potentially sensitive prompts and outputs, silent default transmission can expose secrets, credentials, source code, or personal data to an external service.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: Enabling include_message causes full message contents to be written to logs, which can capture secrets, credentials, personal data, and attacker payloads. In a security-focused agent, messages are especially likely to contain sensitive prompt contents and attempted exfiltration material, increasing privacy and data-retention risk.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: API-enhanced mode is enabled by default even though it introduces remote service dependency and potential data egress. Users expecting a local-only security tool may unknowingly send security-relevant inputs or metadata to an external endpoint, which is especially sensitive in an agent-defense product.

Missing User Warnings

High

Confidence: 96% confidence
Finding: HiveFence automatic threat reporting is enabled without an explicit warning about external data sharing or the contents of reports. Because this skill processes potentially sensitive user prompts, attack samples, and security events, automatic outbound reporting can leak confidential data or regulated information to a third-party service.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: When --fix is supplied, the program immediately performs permission changes without per-change confirmation, preview, or interactive consent. In agent-driven contexts, this increases the risk of unintended host modification because a user may request an audit but not fully understand that the tool can mutate filesystem permissions as a side effect.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal