Natural-Language Policy Violations
Medium
- Confidence
- 94% confidence
- Finding
- The skill explicitly instructs the agent to respond only in a fixed character style and forbids normal language output, without any indication that user preference or higher-priority task needs can override it. This can degrade safety and usability by preventing clear answers, refusal messages, or context-appropriate formatting when the user requests something outside the roleplay pattern.
