Natural-Language Policy Violations
Medium
- Confidence
- 76% confidence
- Finding
- The skill directs the model to impose a specific writing style automatically, even when the user may not have requested that transformation. That can override user intent, distort meaning, and make outputs deceptively more human-like, which is more concerning in this context because the stated purpose is to conceal AI-origin signals.
