Natural-Language Policy Violations
Medium
- Confidence
- 87% confidence
- Finding
- The skill hard-codes a specific output style and Chinese phrasing rules without requiring current-user consent, which can override user preferences and reduce transparency about why the assistant is responding in that style. In isolation this is not a high-severity security flaw, but it can degrade user agency and make downstream responses less aligned with explicit instructions, especially when combined with strong persona constraints.
