Hle Benchmark Evolver

WarnAudited by ClawScan on May 10, 2026.

Overview

The skill’s benchmark-evolution purpose is coherent, but its pipeline can run arbitrary shell evaluator commands and execute undeclared sibling evolver code, so it needs review before use.

Use this skill only if you trust the local capability-evolver and feishu-evolver-wrapper installations. Start with run_result.js or dry-run pipeline settings, avoid --eval_cmd unless you have reviewed the exact command, and do not pass untrusted report paths or benchmark data into the automatic evolution loop.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

High

#ASI02: Tool Misuse and Exploitation

What this means

A poorly reviewed eval_cmd or unsafe report path could run arbitrary local commands, read or modify workspace files, or print environment secrets into command output.

Why it was flagged

The optional evaluator command is interpolated into a shell string and executed by bash in the workspace with the inherited environment, without allowlisting or shell-safe argument handling.

Skill content

const evalCmd = evalCmdTemplate.replace(/\{\{report\}\}/g, reportPath); const evalRes = runShell(evalCmd, workspaceRoot, {}); ... spawnSync('bash', ['-c', command], { cwd, ... env: Object.assign({}, process.env, extraEnv || {}) })

Recommendation

Only use trusted evaluator commands. Prefer spawning a fixed executable with an argument array, quote or escape substituted paths, strip unnecessary environment variables, and require explicit approval before using --eval_cmd.

Medium

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Actual behavior depends on other local skill packages outside this review; those packages could fail, change state, or perform actions the user did not evaluate here.

Why it was flagged

The pipeline imports and runs sibling skill code that is not included in this manifest or declared in the requirements, including a feishu-evolver-wrapper fallback not described in SKILL.md.

Skill content

const capabilityDir = path.join(skillsDir, 'capability-evolver'); const feishuWrapperDir = path.join(skillsDir, 'feishu-evolver-wrapper'); ... rewardMod = require(path.join(feishuWrapperDir, 'src', 'gep', 'benchmarkReward.js'));

Recommendation

Declare and pin the required sibling skill dependencies, disclose the feishu wrapper fallback, and review capability-evolver/feishu-evolver-wrapper before running the pipeline.

Low

#ASI06: Memory and Context Poisoning

What this means

Incorrect or untrusted benchmark reports can influence future curriculum or evolution decisions, and question-level benchmark data may be retained in local state.

Why it was flagged

Benchmark report contents are ingested into persistent reward/curriculum state and the resulting state path is returned.

Skill content

const ingested = reward.ingestBenchmarkReport(report); ... state_path: reward.getStatePath(), report_path: reportPath

Recommendation

Use trusted benchmark reports, inspect the reported state path, and reset or remove the capability-evolver state if bad data was ingested.