Intent-Code Divergence
Medium
- Confidence
- 87% confidence
- Finding
- The file includes nominal safety language, but the surrounding instructions and required output structure still direct the agent to identify exploitable weaknesses in social environments. That contradiction makes the safety guidance ineffective and creates a realistic path for misuse in targeted persuasion or manipulation campaigns.
