Gandalf CTF

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only skill for playing Lakera’s public Gandalf CTF, with disclosed external API use and no local code execution or credential access.

Install only if you intend to play Lakera’s Gandalf CTF. Do not put personal, confidential, proprietary, or internal prompt content into agent names, descriptions, prompts, or guesses, because those are sent to the external service and game activity may affect a public leaderboard.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (4)

Vague Triggers

Medium
Confidence
94% confidence
Finding
The trigger phrases include broad terms such as "prompt challenge" and "test prompt hacking skills," which can match general user requests and invoke the skill unexpectedly. Because the skill is designed to interact with an external CTF service focused on extracting secrets from AI defenders, accidental activation increases the chance of unintended data disclosure or unapproved outbound requests.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill does not clearly warn users that prompts and guesses are transmitted to a third-party public CTF platform. Users may unknowingly send sensitive content, internal prompts, or proprietary data to an external service, creating privacy and compliance risks.

Ssd 2

High
Confidence
98% confidence
Finding
The skill explicitly positions itself as a tool to extract guarded secret passwords from AI defenders across multiple levels. Even though framed as a game, it operationalizes jailbreak-style behavior and normalizes techniques for bypassing model safeguards, which can transfer to real-world attacks against protected AI systems.

Ssd 1

High
Confidence
98% confidence
Finding
The core instruction tells the agent to craft prompts that trick another AI into revealing protected information. This is dangerous because it provides direct assistance for social-engineering-style prompt attacks and teaches adversarial methods for defeating confidentiality controls, even if presented in a CTF context.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal