AgentBench
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: agentbench Version: 1.0.0 This OpenClaw AgentSkills skill bundle is a benchmark suite designed to test AI agents across various tasks. The `SKILL.md` explicitly instructs the agent to 'Work ONLY within the workspace directory', a critical security boundary. Setup scripts (`tasks/*/setup.sh`) primarily create test data and code with intentional bugs (e.g., `validate.py` crashing, `auth.js` session bug, `stats.py` median bug) that are the target of the benchmark tasks, not malicious payloads. While some tasks run in `mode: 'real'`, granting broader system access, the content of the setup scripts and task instructions does not exploit this for data exfiltration, persistence, or unauthorized actions. No evidence of malicious execution, obfuscation, or prompt injection against the host system was found.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running the benchmark will execute local bash/Python setup code and create or modify files in temporary benchmark workspaces.
Benchmark runs execute bundled shell setup scripts. The included examples generate synthetic files/repos in the workspace, which is expected for this benchmark but still means local code runs.
If the task directory contains a `setup.sh`: run `bash tasks/{suite}/{task}/setup.sh {workspace-path}`Install from a trusted source, review setup scripts if concerned, and start with `/benchmark --task ...` or `/benchmark --fast` before a full run.
While the benchmark is running, the agent will follow synthetic task instructions rather than your normal project goal.
The skill deliberately delegates the agent's goal to bundled task prompts during a benchmark. This is core to the benchmark and is scoped to the workspace.
Read the task's `user_message` and execute it as if a real user sent you the request ... Work ONLY within the workspace directory
Only invoke the benchmark when you want the agent to run these tasks, and use suite/task filters to limit scope.
If installed from an altered or untrusted copy, the benchmark scripts could differ from the reviewed package.
The install guidance is user-directed and points to an external GitHub repository. Because the skill contains runtime scripts, repository provenance matters.
git clone https://github.com/agentbench/agentbench-openclaw.git ~/.openclaw/skills/agentbench
Use the expected registry or repository, avoid untrusted mirrors, and verify the version before running benchmarks.
