Intent-Code Divergence
Medium
- Confidence
- 93% confidence
- Finding
- The skill claims reviews are always shown to the human before posting, but it also grants direct review-posting capability with no enforceable guardrail or approval mechanism. In an agent setting, natural-language policy alone is weak, so the agent could post public content without confirmed user consent, causing reputational harm or disclosure of usage details.
