Natural-Language Policy Violations
Medium
- Confidence
- 97% confidence
- Finding
- The content repeatedly uses derogatory and stigmatizing language about gender expression and perceived sexual orientation without any safety framing, transformation, or user-controlled filtering. Even though this appears in literary source text rather than as an instruction, importing and presenting it unmediated in a skill can propagate harmful slurs and hostile characterization to end users.
