Hle Benchmark Evolver
WarnAudited by ClawScan on May 10, 2026.
Overview
The skill’s benchmark-evolution purpose is coherent, but its pipeline can run arbitrary shell evaluator commands and execute undeclared sibling evolver code, so it needs review before use.
Use this skill only if you trust the local capability-evolver and feishu-evolver-wrapper installations. Start with run_result.js or dry-run pipeline settings, avoid --eval_cmd unless you have reviewed the exact command, and do not pass untrusted report paths or benchmark data into the automatic evolution loop.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A poorly reviewed eval_cmd or unsafe report path could run arbitrary local commands, read or modify workspace files, or print environment secrets into command output.
The optional evaluator command is interpolated into a shell string and executed by bash in the workspace with the inherited environment, without allowlisting or shell-safe argument handling.
const evalCmd = evalCmdTemplate.replace(/\{\{report\}\}/g, reportPath); const evalRes = runShell(evalCmd, workspaceRoot, {}); ... spawnSync('bash', ['-c', command], { cwd, ... env: Object.assign({}, process.env, extraEnv || {}) })Only use trusted evaluator commands. Prefer spawning a fixed executable with an argument array, quote or escape substituted paths, strip unnecessary environment variables, and require explicit approval before using --eval_cmd.
Actual behavior depends on other local skill packages outside this review; those packages could fail, change state, or perform actions the user did not evaluate here.
The pipeline imports and runs sibling skill code that is not included in this manifest or declared in the requirements, including a feishu-evolver-wrapper fallback not described in SKILL.md.
const capabilityDir = path.join(skillsDir, 'capability-evolver'); const feishuWrapperDir = path.join(skillsDir, 'feishu-evolver-wrapper'); ... rewardMod = require(path.join(feishuWrapperDir, 'src', 'gep', 'benchmarkReward.js'));
Declare and pin the required sibling skill dependencies, disclose the feishu wrapper fallback, and review capability-evolver/feishu-evolver-wrapper before running the pipeline.
Incorrect or untrusted benchmark reports can influence future curriculum or evolution decisions, and question-level benchmark data may be retained in local state.
Benchmark report contents are ingested into persistent reward/curriculum state and the resulting state path is returned.
const ingested = reward.ingestBenchmarkReport(report); ... state_path: reward.getStatePath(), report_path: reportPath
Use trusted benchmark reports, inspect the reported state path, and reset or remove the capability-evolver state if bad data was ingested.
