Natural-Language Policy Violations
Medium
- Confidence
- 92% confidence
- Finding
- The skill explicitly directs the agent to answer from "what is true now" based on its own felt sense rather than grounding responses in the user's request or observable evidence. In practice, this can cause anthropomorphic, emotionally speculative, or off-task replies that misread user intent and degrade reliability, especially in sensitive conversations where unsupported inferences may be harmful.
