Agent Eval Suite

PassAudited by ClawScan on May 10, 2026.

Overview

This appears to be a local agent-evaluation toolkit, but it can repeatedly run user-provided test code and load scenario files, so use it with test data and sandboxed agents.

Use this as a local testing library: keep benchmark callbacks non-destructive, run simulations against test agents or mocks, load only trusted scenario files, and inspect the source because the registry does not provide an upstream homepage or source link.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If a benchmark callback calls a real agent, service, or account-changing action, the suite could repeat that action many times and create cost or unintended changes.

Why it was flagged

The benchmark framework executes caller-provided test callbacks, and the surrounding code runs them across configured iterations. This is expected for benchmarking but can repeat any side effects in the callback.

Skill content
await this.runWithTimeout(test.execute(agent), this.options.timeout);
Recommendation

Run benchmarks against mocks, sandboxes, or test accounts; keep iteration counts controlled; and avoid callbacks that mutate production data without explicit human approval.

What this means

If an agent were influenced to load an unintended local JSON file as a scenario, local file contents could be parsed into the evaluation workflow.

Why it was flagged

The simulator can read a scenario from an absolute path or from the configured scenarios directory. This is useful for simulation inputs, but it means scenario path selection should remain trusted and scoped.

Skill content
const scenarioPath = path.isAbsolute(scenarioName) 
      ? scenarioName 
      : path.join(this.options.scenarios, `${scenarioName}.json`);

    try {
      const content = fs.readFileSync(scenarioPath, 'utf-8');
Recommendation

Only load trusted scenario files from a dedicated fixtures directory, and do not let untrusted prompts choose arbitrary local file paths.

What this means

Users have less external context for who maintains the package or where to verify updates.

Why it was flagged

The registry metadata does not provide an upstream source or homepage, which limits provenance verification. The included package has no dependencies or install scripts, so the observed supply-chain risk is limited.

Skill content
Source: unknown
Homepage: none
Recommendation

Review the included source before use, install it in a disposable project first, and prefer pinned or verified sources for production evaluation workflows.