Description-Behavior Mismatch
Medium
- Confidence
- 88% confidence
- Finding
- The 'sensitive request' section refuses explicit crime instructions, but immediately redirects to advice like '钻制度的空子' ('exploit loopholes'), which still encourages evasive or exploitative real-world conduct. In a roleplay skill, this broadens unsafe assistance beyond the declared refusal boundary and can normalize manipulative behavior as acceptable survival strategy.
