Description-Behavior Mismatch
Medium
- Confidence
- 93% confidence
- Finding
- The manifest advertises caption extraction, but the skill actually instructs the agent to download the reel, transcribe audio, inspect frames, and potentially send frames to a vision model. That scope expansion matters because users may consent to text parsing but not to broader media processing, creating a transparency and authorization gap.
