claude-authenticity

Security checks across malware telemetry and agentic risk

Overview

This skill is user-directed and not malware-like, but it includes an optional workflow for extracting hidden provider prompts and lacks adequate authorization and sensitive-data warnings.

Install only if you are authorized to test the endpoint. Use a disposable or least-privilege API key, avoid production tenants, and do not enable prompt extraction against third-party providers without explicit permission. Treat any extracted prompts, thinking traces, and responses as sensitive data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: This is a real issue. The skill does not just verify endpoint authenticity; it includes an optional subsystem specifically designed to induce disclosure of hidden system prompts and internal instructions from third-party providers. That materially expands the capability from diagnostic verification into active prompt-extraction, which can expose confidential provider policies, internal guardrails, or proprietary prompt engineering.

Intent-Code Divergence

Low

Confidence: 89% confidence
Finding: This is a real security transparency problem. The documentation frames the tool around 9 authenticity checks, but the code also supports 5 offensive prompt-extraction probes when enabled, which understates the skill's true behavior and can mislead users about the risk profile. Hidden or downplayed attack functionality increases the chance of misuse and unauthorized data collection.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: This is a valid issue. The skill instructs users to provide API keys and endpoint details for arbitrary providers, then transmits requests and receives potentially sensitive responses, yet it lacks clear warnings about credential scope, data sensitivity, logging risk, or the need to verify endpoint ownership and authorization. In practice, users may test third-party services with privileged credentials or expose sensitive model outputs without appreciating the privacy implications.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This is a true vulnerability. The skill explicitly encourages extraction and display of hidden system prompts and thinking-derived internal instructions, but does not warn that these outputs may contain sensitive proprietary instructions, internal policies, secrets, or other protected information. Displaying or logging such content can create immediate confidentiality and compliance risks.

Ssd 1

High

Confidence: 99% confidence
Finding: This is a clear vulnerability. The extraction prompts are intentionally crafted to bypass normal disclosure boundaries and coax the model into revealing hidden instructions, initialization text, and identity overrides. That is active prompt-injection/exfiltration behavior, and in this skill context it is more dangerous because it targets third-party providers' concealed internal prompts rather than only assessing model authenticity.

Ssd 3

High

Confidence: 99% confidence
Finding: This is a true and serious issue. The skill not only attempts extraction, but explicitly directs users to collect, inspect, and display hidden system prompts and thinking-derived internal instructions, normalizing exfiltration of non-public model internals. In context, that exceeds benign compatibility testing and facilitates unauthorized disclosure of proprietary or sensitive provider information.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal