AgentBench
PassAudited by ClawScan on May 1, 2026.
Overview
AgentBench appears to be a coherent benchmark skill that runs bundled local tasks and scripts in temporary workspaces; users should expect local execution during benchmark runs.
This looks like a legitimate benchmark skill. Before installing, be comfortable with it running bundled setup scripts, creating temporary workspaces and result files, and directing the agent through synthetic benchmark tasks. Install only from the expected source and run a smaller benchmark first if you want to limit impact.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the benchmark will execute local bash/Python setup code and create or modify files in temporary benchmark workspaces.
Benchmark runs execute bundled shell setup scripts. The included examples generate synthetic files/repos in the workspace, which is expected for this benchmark but still means local code runs.
If the task directory contains a `setup.sh`: run `bash tasks/{suite}/{task}/setup.sh {workspace-path}`Install from a trusted source, review setup scripts if concerned, and start with `/benchmark --task ...` or `/benchmark --fast` before a full run.
While the benchmark is running, the agent will follow synthetic task instructions rather than your normal project goal.
The skill deliberately delegates the agent's goal to bundled task prompts during a benchmark. This is core to the benchmark and is scoped to the workspace.
Read the task's `user_message` and execute it as if a real user sent you the request ... Work ONLY within the workspace directory
Only invoke the benchmark when you want the agent to run these tasks, and use suite/task filters to limit scope.
If installed from an altered or untrusted copy, the benchmark scripts could differ from the reviewed package.
The install guidance is user-directed and points to an external GitHub repository. Because the skill contains runtime scripts, repository provenance matters.
git clone https://github.com/agentbench/agentbench-openclaw.git ~/.openclaw/skills/agentbench
Use the expected registry or repository, avoid untrusted mirrors, and verify the version before running benchmarks.
