Missing User Warnings
Medium
- Confidence
- 93% confidence
- Finding
- The guidance to "store raw judge responses for debugging score disputes" encourages logging full model outputs without any warning about sensitive, user-provided, or proprietary content that may appear in prompts, candidates, or judge rationales. In an evaluation pipeline, these logs can accumulate confidential data and create a secondary exposure surface through log stores, dashboards, backups, and support tooling.
