Natural-Language Policy Violations
Medium
- Confidence
- 91% confidence
- Finding
- The notes explicitly include 'attention manipulation' language directing the agent to re-read the plan before major decisions. Even though framed as productivity guidance, this embeds a behavioral steering mechanism in a reusable template, which can bias future decisions and amplify any unsafe or user-misaligned content copied into the plan.
