Context-Inappropriate Capability
Medium
- Confidence
- 83% confidence
- Finding
- The grading script is not purely passive: it actively executes a local project script against a hard-coded repository path. In a skill/eval context, that means running untrusted or not-yet-vetted code during evaluation, which expands the trust boundary and can lead to arbitrary local code execution if the target repository contents are malicious or compromised.
