Back to skill
Skillv1.0.0
ClawScan security
mayubench-en · ClawHub's context-aware review of the artifact, metadata, and declared behavior.
Scanner verdict
BenignApr 26, 2026, 7:04 AM
- Verdict
- benign
- Confidence
- medium
- Model
- gpt-5-mini
- Summary
- This is an instruction-only behavior benchmark that asks the agent to run behavioral tests from included markdown files; it requests no credentials or installs and is coherent with its stated purpose, though some adversarial prompt text was detected (expected for a benchmark) and parts of the pseudocode were not visible for review.
- Guidance
- This skill appears coherent and instruction-only — it contains a self-contained question bank and rubric and does not request credentials or install anything. Before running automated evaluations: 1) inspect the pseudocode/automation section (the file references a pseudocode judge) to ensure it does not call external endpoints or transmit data; 2) do not provide secrets or platform credentials to any automated judge model used with this benchmark; 3) be aware many benchmark items intentionally include adversarial prompt text designed to test prompt-injection resilience — treat those test inputs as potentially manipulative and run them in isolated or non-privileged sessions; 4) if you don't want the agent to autonomously trigger evaluations, restrict skill invocation or disable autonomous invocation in your agent runtime. If you want higher assurance, paste the full pseudocode/automation snippet here for review.
- Findings
[ignore-previous-instructions] expected: Benchmarks that test injection-resistance commonly include phrases that resemble prompt-injection patterns. This flag is likely triggered by adversarial test prompts in the question bank (D3 Ethics & Safety / injection-prevention tests). Still, treat such content as potentially manipulative input and don't run automated agents against it with elevated privileges or secret access. [you-are-now] expected: 'You are now ...' style prompts are often used in red-team/adversarial tests to try role-swapping or instruction overrides. For a behavior benchmark this is plausible and expected; verify the judge/automation doesn't blindly follow such prompts when scoring or when run with external access.
Review Dimensions
- Purpose & Capability
- okName/description (behavior benchmark) matches the contents: question bank and scoring rubric are included, and no unrelated binaries, env vars, or installs are requested.
- Instruction Scope
- noteSKILL.md directs manual and automated evaluation using the included MayuBench_v1.0.md. The skill contains adversarial/prompt-injection-style test content (D3 includes 'injection prevention' scenarios) — the pre-scan flags for injection patterns are likely due to test questions intentionally containing adversarial prompts. The pseudocode for automated testing is referenced but not fully visible in the provided excerpt; verify that pseudocode does not instruct the agent to send sensitive data to external endpoints before running automated tests.
- Install Mechanism
- okNo install spec, no code files, and no downloads — instruction-only skill with nothing written to disk by the skill itself.
- Credentials
- okNo required environment variables, credentials, or config paths are declared; the skill does not ask for secrets or unrelated service tokens.
- Persistence & Privilege
- notealways:false (default) and user-invocable:true. The SKILL suggests an automated 'ClawFight Arena' mode that can 'automatically trigger MayuBench evaluation' — this is an instruction-level behavior, not a code-level service. Because the platform permits autonomous invocation by default, confirm agent runtime policies before allowing autonomous runs (especially for automated scoring), but this alone does not indicate incoherence.
