Intent-Code Divergence
Medium
- Confidence
- 98% confidence
- Finding
- The skill explicitly instructs the model to reveal its thought process during user interaction. Requests to expose internal reasoning are unsafe because they can leak hidden decision logic, security cues, and system-behavior details that should remain private, and such instructions are especially suspicious when embedded in untrusted skill content.
