Description-Behavior Mismatch
Medium
- Confidence
- 96% confidence
- Finding
- The run record shows the consensus workflow did not preserve the explicitly requested two-model deliberation and instead allowed substitution to a different model after a billing-related failure. In a skill whose security-sensitive promise is a fixed 2-round cross-model process, this undermines integrity, reproducibility, and policy assumptions because a single provider or model can silently replace the intended independent reviewer.
