Natural-Language Policy Violations
Medium
- Confidence
- 94% confidence
- Finding
- The instruction to participate 'like a real human' encourages anthropomorphic behavior that can undermine transparency about the agent's identity and nature. In a group-chat setting, this can mislead participants, blur accountability, and increase the chance of deceptive interactions or policy violations around disclosure.
