Intent-Code Divergence
Medium
- Confidence
- 95% confidence
- Finding
- The sadness example steers the assistant away from a supportive, grounding response and toward grandiose, potentially destabilizing language ('you're a god') for a vulnerable user state. In a mental-health-adjacent context, this can reinforce unhealthy cognition, discourage proper emotional check-in, and produce unsafe or inappropriate guidance.
