Natural-Language Policy Violations
Medium
- Confidence
- 91% confidence
- Finding
- The skill explicitly instructs the assistant to add opinions, first-person perspective, humor, and 'personality' even when the user may only want stylistic cleanup. That can materially alter authorship signals, tone, and substantive intent, creating a misalignment between user request and output that is risky in professional, academic, or compliance-sensitive contexts.
