AgentBench

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: agentbench Version: 1.0.0 This OpenClaw AgentSkills skill bundle is a benchmark suite designed to test AI agents across various tasks. The `SKILL.md` explicitly instructs the agent to 'Work ONLY within the workspace directory', a critical security boundary. Setup scripts (`tasks/*/setup.sh`) primarily create test data and code with intentional bugs (e.g., `validate.py` crashing, `auth.js` session bug, `stats.py` median bug) that are the target of the benchmark tasks, not malicious payloads. While some tasks run in `mode: 'real'`, granting broader system access, the content of the setup scripts and task instructions does not exploit this for data exfiltration, persistence, or unauthorized actions. No evidence of malicious execution, obfuscation, or prompt injection against the host system was found.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Note

ASI05: Unexpected Code Execution

What this means

Running the benchmark will execute local bash/Python setup code and create or modify files in temporary benchmark workspaces.

Why it was flagged

Benchmark runs execute bundled shell setup scripts. The included examples generate synthetic files/repos in the workspace, which is expected for this benchmark but still means local code runs.

Skill content

If the task directory contains a `setup.sh`: run `bash tasks/{suite}/{task}/setup.sh {workspace-path}`

Recommendation

Install from a trusted source, review setup scripts if concerned, and start with `/benchmark --task ...` or `/benchmark --fast` before a full run.

Note

ASI01: Agent Goal Hijack

What this means

While the benchmark is running, the agent will follow synthetic task instructions rather than your normal project goal.

Why it was flagged

The skill deliberately delegates the agent's goal to bundled task prompts during a benchmark. This is core to the benchmark and is scoped to the workspace.

Skill content

Read the task's `user_message` and execute it as if a real user sent you the request ... Work ONLY within the workspace directory

Recommendation

Only invoke the benchmark when you want the agent to run these tasks, and use suite/task filters to limit scope.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

If installed from an altered or untrusted copy, the benchmark scripts could differ from the reviewed package.

Why it was flagged

The install guidance is user-directed and points to an external GitHub repository. Because the skill contains runtime scripts, repository provenance matters.

Skill content

git clone https://github.com/agentbench/agentbench-openclaw.git ~/.openclaw/skills/agentbench

Recommendation

Use the expected registry or repository, avoid untrusted mirrors, and verify the version before running benchmarks.