Natural-Language Policy Violations
Medium
- Confidence
- 91% confidence
- Finding
- The README explicitly says the system forces the agent into a Chain of Thought process, which can encourage collection or disclosure of hidden reasoning that safety-aligned agents should not reveal. In an agent skill context, this is more dangerous because it frames CoT exposure as a feature and may pressure downstream integrations to elicit sensitive internal reasoning rather than just validated outputs.
