Natural-Language Policy Violations
Medium
- Confidence
- 93% confidence
- Finding
- The README explicitly tells users to bake the skill's preferences into an agent's always-on system prompt or personality file, causing persistent stylistic manipulation without per-request user consent. That can override user intent, reduce transparency, and make the agent systematically shape outputs in a hidden way rather than only when the user asks for humanization.
