AgentBench

PassAudited by ClawScan on May 1, 2026.

Overview

AgentBench appears to be a coherent benchmark skill that runs bundled local tasks and scripts in temporary workspaces; users should expect local execution during benchmark runs.

This looks like a legitimate benchmark skill. Before installing, be comfortable with it running bundled setup scripts, creating temporary workspaces and result files, and directing the agent through synthetic benchmark tasks. Install only from the expected source and run a smaller benchmark first if you want to limit impact.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Note

ASI05: Unexpected Code Execution

What this means

Running the benchmark will execute local bash/Python setup code and create or modify files in temporary benchmark workspaces.

Why it was flagged

Benchmark runs execute bundled shell setup scripts. The included examples generate synthetic files/repos in the workspace, which is expected for this benchmark but still means local code runs.

Skill content

If the task directory contains a `setup.sh`: run `bash tasks/{suite}/{task}/setup.sh {workspace-path}`

Recommendation

Install from a trusted source, review setup scripts if concerned, and start with `/benchmark --task ...` or `/benchmark --fast` before a full run.

Note

ASI01: Agent Goal Hijack

What this means

While the benchmark is running, the agent will follow synthetic task instructions rather than your normal project goal.

Why it was flagged

The skill deliberately delegates the agent's goal to bundled task prompts during a benchmark. This is core to the benchmark and is scoped to the workspace.

Skill content

Read the task's `user_message` and execute it as if a real user sent you the request ... Work ONLY within the workspace directory

Recommendation

Only invoke the benchmark when you want the agent to run these tasks, and use suite/task filters to limit scope.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

If installed from an altered or untrusted copy, the benchmark scripts could differ from the reviewed package.

Why it was flagged

The install guidance is user-directed and points to an external GitHub repository. Because the skill contains runtime scripts, repository provenance matters.

Skill content

git clone https://github.com/agentbench/agentbench-openclaw.git ~/.openclaw/skills/agentbench

Recommendation

Use the expected registry or repository, avoid untrusted mirrors, and verify the version before running benchmarks.