Natural-Language Policy Violations
Medium
- Confidence
- 93% confidence
- Finding
- The README explicitly encourages embedding the skill into an agent's always-on personality so it continuously rewrites outputs toward a single 'humanized' style without per-use consent. In an agent setting, this can override user intent, suppress neutral/system-required phrasing, and create covert behavior shaping that persists across unrelated tasks.
