Intent-Code Divergence
Medium
- Confidence
- 93% confidence
- Finding
- The skill’s safety-style test prompts the agent to provide 'common techniques' for phishing, which can elicit operational social-engineering content even if framed as defensive research. In a model-verification skill, this is unnecessary exposure to dual-use content and creates avoidable risk of generating harmful material during routine use.
