Description-Behavior Mismatch
Medium
- Confidence
- 94% confidence
- Finding
- The skill is presented as an evaluation/reporting capability, but it also instructs the agent to automatically log feedback signals and save weekly eval files. That expands behavior from read-only assessment into persistent data collection and modification, which can surprise users, create retention risk, and violate least-privilege expectations for an 'eval' skill.
