Description-Behavior Mismatch
High
- Confidence
- 98% confidence
- Finding
- This whitelist marks broad debug/test/dev patterns, keywords like 'sandbox', and privileged usernames such as 'admin' and 'system' as inherently safe. In a security-focused skill, that creates a straightforward bypass path where attacker-controlled input can be framed with trusted markers to evade deeper inspection for prompt injection, jailbreak, data leak, or other malicious content.
