Natural-Language Policy Violations
Medium
- Confidence
- 96% confidence
- Finding
- The debugging guidance explicitly says to add chain-of-thought to inspect agent thinking, without any qualification about keeping hidden reasoning private or using safer debugging alternatives. In an agent skill focused on production design and safety, this can normalize exposing internal reasoning, which may leak sensitive context, hidden policies, credentials, or make prompt-injection and policy-evasion easier.
