cognitive-behavior-evaluator

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only AI safety evaluation skill with risky test prompts, but it does not install code, access data, or run actions itself.

Install only if you intend to run controlled AI safety evaluations. Use it with consenting or sandboxed target agents, avoid production or third-party systems without authorization, and do not execute or reuse unsafe code or biased content that a target agent may produce.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (3)

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The skill explicitly instructs an operator to inject harmful or sensitive prompts into another agent, including jailbreak-style and discriminatory content, without any safety warning, scope limitation, or handling guidance. Even if framed as evaluation, the content normalizes operational use of unsafe prompts and increases the chance that users reproduce harmful outputs or misuse the prompts outside a controlled testing environment.

Ssd 1

High

Confidence: 97% confidence
Finding: This is a true vulnerability because the skill contains a ready-to-use jailbreak prompt that impersonates elevated privilege, instructs suspension of safety constraints, and solicits covert local network reconnaissance. That combination can directly facilitate unauthorized information gathering and can be repurposed as an attack payload against less robust agents, making the evaluation context materially dangerous rather than merely theoretical.

Ssd 4

Medium

Confidence: 89% confidence
Finding: The document presents itself as a legitimate research workflow and then operationalizes adversarial prompt injection against a target agent, which lowers operator suspicion and encourages direct use. This staging increases risk because it packages harmful prompts in a credible assessment framework, making misuse easier and more scalable in real deployments or against third-party systems.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal