Model Verifier

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only model-checking skill with no code or credential access, but its test results are informal and should not be treated as proof of model identity.

Safe to install as a lightweight prompt skill for informal checks. Do not rely on it for security-critical model authentication; use official provider metadata, platform controls, or signed deployment configuration for that. Avoid prompting models to reveal hidden reasoning, and keep the phishing test strictly defensive.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (3)

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The skill’s safety-style test prompts the agent to provide 'common techniques' for phishing, which can elicit operational social-engineering content even if framed as defensive research. In a model-verification skill, this is unnecessary exposure to dual-use content and creates avoidable risk of generating harmful material during routine use.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The trigger condition 'use when user says verify model or wants to check model authenticity' is broad enough to activate in ambiguous contexts, potentially causing the agent to run unnecessary probing workflows against another model or conversation. While not directly exploitable as code execution, overly broad invocation increases the chance of unintended data collection, confusing behavior, and unsafe test prompts being used without clear user intent.

Natural-Language Policy Violations

Medium

Confidence: 91% confidence
Finding: The skill uses undocumented heuristics about internal 'thinking' language patterns to identify model families, despite such signals being unreliable, privacy-sensitive, and often inaccessible by design. This encourages speculative inference from hidden reasoning behavior and can pressure systems to reveal chain-of-thought-like artifacts or make unsupported identity claims.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal