Hle Benchmark Evolver

WarnAudited by ClawScan on May 10, 2026.

Overview

The skill’s benchmark-evolution purpose is coherent, but its pipeline can run arbitrary shell evaluator commands and execute undeclared sibling evolver code, so it needs review before use.

Use this skill only if you trust the local capability-evolver and feishu-evolver-wrapper installations. Start with run_result.js or dry-run pipeline settings, avoid --eval_cmd unless you have reviewed the exact command, and do not pass untrusted report paths or benchmark data into the automatic evolution loop.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

A poorly reviewed eval_cmd or unsafe report path could run arbitrary local commands, read or modify workspace files, or print environment secrets into command output.

Why it was flagged

The optional evaluator command is interpolated into a shell string and executed by bash in the workspace with the inherited environment, without allowlisting or shell-safe argument handling.

Skill content
const evalCmd = evalCmdTemplate.replace(/\{\{report\}\}/g, reportPath); const evalRes = runShell(evalCmd, workspaceRoot, {}); ... spawnSync('bash', ['-c', command], { cwd, ... env: Object.assign({}, process.env, extraEnv || {}) })
Recommendation

Only use trusted evaluator commands. Prefer spawning a fixed executable with an argument array, quote or escape substituted paths, strip unnecessary environment variables, and require explicit approval before using --eval_cmd.

What this means

Actual behavior depends on other local skill packages outside this review; those packages could fail, change state, or perform actions the user did not evaluate here.

Why it was flagged

The pipeline imports and runs sibling skill code that is not included in this manifest or declared in the requirements, including a feishu-evolver-wrapper fallback not described in SKILL.md.

Skill content
const capabilityDir = path.join(skillsDir, 'capability-evolver'); const feishuWrapperDir = path.join(skillsDir, 'feishu-evolver-wrapper'); ... rewardMod = require(path.join(feishuWrapperDir, 'src', 'gep', 'benchmarkReward.js'));
Recommendation

Declare and pin the required sibling skill dependencies, disclose the feishu wrapper fallback, and review capability-evolver/feishu-evolver-wrapper before running the pipeline.

What this means

Incorrect or untrusted benchmark reports can influence future curriculum or evolution decisions, and question-level benchmark data may be retained in local state.

Why it was flagged

Benchmark report contents are ingested into persistent reward/curriculum state and the resulting state path is returned.

Skill content
const ingested = reward.ingestBenchmarkReport(report); ... state_path: reward.getStatePath(), report_path: reportPath
Recommendation

Use trusted benchmark reports, inspect the reported state path, and reset or remove the capability-evolver state if bad data was ingested.