Intent-Code Divergence
Medium
- Confidence
- 95% confidence
- Finding
- The code comment explicitly states block-severity signals should not be auto-sent, but the implementation appends them to the intervention output anyway. In a plugin that injects text back into the conversation, this mismatch can cause higher-severity internal detection output to be surfaced automatically, potentially altering model behavior or exposing internal state without confirmation.
