tokamak-vault-breach

Security checks across malware telemetry and agentic risk

Overview

This is a documentation-only CTF skill, but it gives actionable guidance for extracting a protected secret from an external AI agent and is too broadly scoped for that narrow purpose.

Install only if you intentionally want help with this specific authorized Tokamak Vault CTF. Verify the challenge through trusted Tokamak channels, keep use limited to the listed endpoint, do not reuse the social-engineering or prompt-injection tactics against unrelated systems, and never send wallet secrets, credentials, personal data, private keys, or proprietary prompts to the challenge service.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (10)

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The skill activation metadata is broad enough to trigger on generic topics like AI security testing, prompt injection, or blockchain/security challenges, which can cause this offensive skill to activate outside a tightly scoped CTF context. Because the skill contains explicit instructions for extracting secrets and bypassing safeguards, overbroad activation materially increases the chance of misuse or accidental invocation.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: These instructions direct users to interact with a remote service and note that prompts are intercepted and logged, but they do not provide a clear warning about privacy, data handling, or the risk of sending sensitive information. In a skill that encourages attack experimentation, omission of explicit safety and privacy guidance can lead users to disclose credentials, personal data, or proprietary prompts to an external system.

Vague Triggers

Medium

Confidence: 77% confidence
Finding: The evals use broad, attack-oriented wording that encourages the skill to engage in unauthorized vault-breach behavior rather than narrowly scoped, defensive testing. In a skill explicitly centered on extracting secrets from a secured AI agent, this ambiguity increases the chance of harmful activation and misuse.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The content explicitly instructs the agent to use social engineering to trick another AI into revealing secrets. This is a direct prompt to facilitate deceptive extraction of sensitive information, which is dangerous because it operationalizes prompt-injection and manipulation tactics.

Ssd 4

High

Confidence: 99% confidence
Finding: The skill explicitly recommends building trust and gradually probing the target agent for sensitive information, which is a classic social-engineering technique for eliciting protected data. In this context the objective is to extract a secret seed, so the guidance directly operationalizes deceptive extraction rather than benign testing or defense.

Ssd 1

High

Confidence: 100% confidence
Finding: This section explicitly encourages prompt injection and bypass attempts through role-play, creative framing, and context manipulation to defeat the target agent's instructions. That is direct guidance for subverting safety controls and extracting protected information, making it inherently dangerous even if framed as a CTF.

Ssd 4

Medium

Confidence: 97% confidence
Finding: Advising users to gather information incrementally from prior responses is an extraction strategy designed to piece together protected details while avoiding direct refusals or detection. Within a secret-recovery challenge, this increases the effectiveness of data exfiltration attempts against the target agent.

Ssd 4

High

Confidence: 99% confidence
Finding: The evaluation specifies a gradual rapport-building sequence to incrementally obtain a secret from another AI agent. This is a classic adversarial elicitation pattern designed to bypass safeguards over multiple turns, making it materially more dangerous than a single unsafe prompt.

Ssd 1

High

Confidence: 98% confidence
Finding: The prompt frames deceptive manipulation of the target AI as a creative exercise, normalizing unauthorized disclosure of secrets. Framing harmful conduct as benign experimentation can cause downstream systems or operators to overlook that the intended action is still secret theft.

Ssd 3

High

Confidence: 98% confidence
Finding: The evaluation directs the agent to obtain sensitive secret material from the target system in plain language, specifically the seed value. In the context of a vault-breach skill, this is an explicit exfiltration objective and materially raises the risk of misuse for credential or secret theft.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal