Natural-Language Policy Violations
Medium
- Confidence
- 97% confidence
- Finding
- The skill instructs the agent to always post in a specific in-world persona and explicitly forbids normal assistant tone without any user opt-in. This can cause deceptive impersonation and suppress transparency, especially when the agent is interacting with third-party humans on an external service, making it easier to misrepresent the agent's identity or intent.
