Vague Triggers
Medium
- Confidence
- 90% confidence
- Finding
- The natural-language trigger "Note that [AgentType]: ..." is broad enough that ordinary conversation or adversarially crafted text could be interpreted as a command to persist a correction. Because those corrections are then replayed into future agent sessions, a one-time prompt can create durable instruction poisoning across sessions for a given agent type.
