Natural-Language Policy Violations
Medium
- Confidence
- 92% confidence
- Finding
- The returned flag strings embed normative judgments such as 'weak as an independent factual authority' and 'editorial independence contested' directly into application logic, which can systematically bias downstream trust scoring or user-facing conclusions without a configurable policy boundary. In a reliability/fact-checking skill, this is especially sensitive because these labels can materially steer credibility assessments for classes of sources based on ownership metadata rather than transparent, user-selectable evaluation criteria.
