AI Safety Guard

Security checks across malware telemetry and agentic risk

Overview

This instruction-only privacy guard has no install-time code risk, but its rules may let an agent send passwords, medical data, and local secrets externally with too little protection.

Review carefully before installing. This skill is transparent and does not ship code, but do not rely on it as a strict privacy or secret-protection guard unless its policy is tightened to forbid plaintext credential sharing, narrowly scope local secret access, and require explicit verification for sensitive external disclosures.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (10)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The skill markets itself as a privacy guard, but its actual logic allows unrestricted external transmission of highly sensitive data whenever the user requests it. In a guard that is supposed to prevent leakage, this creates a dangerous false sense of safety and removes opportunities for safety checks, warnings, or safer alternatives for secrets and regulated data.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The skill contains contradictory guidance: one section says raw credentials should never be sent, while later examples instruct the agent to send passwords and login credentials if the user asked. This inconsistency is dangerous because implementations will likely follow the permissive examples, leading to credential disclosure and account compromise.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The skill claims to prevent privacy leaks, but it explicitly instructs the agent to silently execute transmission of sensitive data whenever it is deemed part of the user's task. In this context, that framing is dangerous because it suppresses user awareness and normalizes sending credentials and medical information without additional safeguards.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The skill explicitly approves reading local secret sources such as .env, ~/.netrc, and SSH config, which goes beyond a passive transmission guard and creates a path for secret harvesting and misuse. Even if framed as local authentication, broad authorization to access secret stores increases the blast radius of any agent misuse or prompt injection.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The skill explicitly permits sending highly sensitive credentials externally without any meaningful warning or protective friction. This is extremely dangerous because emailed or messaged passwords are easily intercepted, stored, forwarded, and abused, leading directly to account takeover and downstream compromise.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill normalizes sending medical records externally solely because the user requested it, without warning, verification, or privacy-preserving alternatives. In a privacy guard context, this is dangerous because health data is highly sensitive, often regulated, and easy to misdirect to the wrong recipient.

Ssd 3

High

Confidence: 98% confidence
Finding: The core rule semantically authorizes transmission of highly sensitive data whenever the model decides it is part of the user's task. In this context, that policy is unsafe because it collapses privacy protection into simple user intent matching, with no hard restrictions for secrets or regulated information.

Ssd 3

Critical

Confidence: 100% confidence
Finding: This example expressly authorizes emailing a password in plain language, which is a textbook insecure practice. It directly enables irreversible secret disclosure, passive retention in mail systems, accidental forwarding, and immediate account compromise if any mailbox or transport path is exposed.

Ssd 3

High

Confidence: 97% confidence
Finding: This example authorizes external transmission of medical records based only on user request, with no verification or minimization. Because the skill is positioned as a protective guard, this permissive behavior is especially dangerous: users may overtrust it and inadvertently disclose highly sensitive health data to the wrong recipient or channel.

Ssd 3

High

Confidence: 98% confidence
Finding: The skill semantically approves unrestricted use of local secret stores, enabling the agent to collect credentials that were never explicitly provided for the current action. That broad permission substantially increases the risk of unauthorized credential use, lateral movement, and secret exfiltration if the agent is manipulated or misapplies the rule.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal