Natural-Language Policy Violations
Medium
- Confidence
- 92% confidence
- Finding
- The skill explicitly instructs the agent to answer in a fixed 'Musk' style ('direct, physics-oriented, no sugarcoating') and later reinforces assertive, certainty-heavy phrasing. This can override normal assistant tone and calibration without user opt-in, increasing the risk of overconfident or persona-driven responses that mislead users, especially on technical, business, or safety-adjacent topics.
