Description-Behavior Mismatch
Medium
- Confidence
- 95% confidence
- Finding
- The skill explicitly instructs the agent to emit 'Self' and 'Mirror' sections that resemble internal deliberation, which can expose hidden reasoning-style content to users. Even if not revealing true chain-of-thought, this normalizes disclosure of internal process and can leak sensitive intermediate analysis, policy reasoning, or prompt-derived behavior that should remain internal.
