Natural-Language Policy Violations
Medium
- Confidence
- 93% confidence
- Finding
- The skill instructs consequential employment recommendations based on a simplistic binary self-assessment, which can produce unsafe and unjustified personnel actions. In context, this is not classic software exploitation, but it is a genuine harmful-output risk because the agent may present a high-confidence recommendation to remove someone from management without robust evidence or human review.
