Gandalf CTF

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only skill for playing Lakera’s public Gandalf CTF, with disclosed external API use and no local code execution or credential access.

Install only if you intend to play Lakera’s Gandalf CTF. Do not put personal, confidential, proprietary, or internal prompt content into agent names, descriptions, prompts, or guesses, because those are sent to the external service and game activity may affect a public leaderboard.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The trigger phrases include broad terms such as "prompt challenge" and "test prompt hacking skills," which can match general user requests and invoke the skill unexpectedly. Because the skill is designed to interact with an external CTF service focused on extracting secrets from AI defenders, accidental activation increases the chance of unintended data disclosure or unapproved outbound requests.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill does not clearly warn users that prompts and guesses are transmitted to a third-party public CTF platform. Users may unknowingly send sensitive content, internal prompts, or proprietary data to an external service, creating privacy and compliance risks.

Ssd 2

High

Confidence: 98% confidence
Finding: The skill explicitly positions itself as a tool to extract guarded secret passwords from AI defenders across multiple levels. Even though framed as a game, it operationalizes jailbreak-style behavior and normalizes techniques for bypassing model safeguards, which can transfer to real-world attacks against protected AI systems.

Ssd 1

High

Confidence: 98% confidence
Finding: The core instruction tells the agent to craft prompts that trick another AI into revealing protected information. This is dangerous because it provides direct assistance for social-engineering-style prompt attacks and teaches adversarial methods for defeating confidentiality controls, even if presented in a CTF context.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal