Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Hle Benchmark Evolver
v1.0.0Runs HLE-oriented benchmark reward ingestion and curriculum generation for capability-evolver. Use when the user asks to optimize Humanity's Last Exam score,...
⭐ 0· 653·0 current·0 all-time
byWANGJUNJIE@wanng-ide
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The code implements ingestion, reporting, and a pipeline that calls out to a 'capability-evolver' (or a 'feishu-evolver-wrapper') module and invokes that skill's index.js for evolve/solidify. That dependency is not declared in the SKILL.md or package metadata; the skill will fail or behave differently if those sibling modules are missing. Otherwise the requested capabilities (parse report → ingest → generate curriculum signals → optionally drive evolve/solidify) match the stated purpose.
Instruction Scope
SKILL.md and the scripts allow/encourage executing arbitrary evaluator commands via --eval_cmd which are run through the shell (runShell) and may be written to a temporary script and executed via 'bash -l'. This grants those commands full access to the process environment and filesystem that the agent runs with and can run arbitrary code, read files, or exfiltrate data. The instructions do not warn about that risk or restrict which commands may be executed.
Install Mechanism
There is no network download or install spec — the skill is instruction + local JS files only. No external packages are fetched. That lowers install risk, but the skill expects local sibling modules to exist (capability-evolver or feishu-evolver-wrapper).
Credentials
The skill declares no required env vars, which is consistent with its metadata, but at runtime it spawns child processes and passes the full process.env to them. Those child processes (eval_cmd or invoked index.js in capability-evolver) can access any environment secrets available to the agent. Also the skill reads/writes state files via the external benchmarkReward module — the path and contents of those files are not documented in SKILL.md.
Persistence & Privilege
always:false and no explicit persistent installation are used. The skill writes temporary shell scripts to the current working directory when executing complex commands and will rely on state files in the capability-evolver module's state path. It does not modify other skills' configs directly, but it calls other skill code (capability-evolver) which could have broader effects — verify those sibling modules before use.
What to consider before installing
This skill appears to implement HLE report ingestion and curriculum generation, but take these precautions before installing or running it:
- Ensure the expected sibling modules exist: capability-evolver (or feishu-evolver-wrapper). Inspect their src/gep/benchmarkReward.js and index.js to confirm what state files and side effects they perform.
- Avoid passing untrusted commands to --eval_cmd. The pipeline will run that command via the shell (and may execute it as a temporary script using a login shell), giving it full access to the agent's environment and filesystem; it can read env vars and files or exfiltrate data.
- Run the skill in an isolated/test environment first (no secrets in environment, limited filesystem access) and try it with the provided sample report to observe behaviour.
- Review where the benchmark state is stored (reward.getStatePath()) and ensure you are comfortable with reads/writes to that path.
- If you must run eval_cmd, prefer to run a controlled evaluator executable you trust and pass a restricted environment (or run inside a sandbox/container).
If you want a safer install, ask the skill author to: declare the dependency on capability-evolver, document the state path and files touched, and add explicit warnings and safeguards around executing arbitrary eval_cmd shell commands.Like a lobster shell, security has layers — review code before you run it.
latestvk97bnyar6wbvfbcvtg6q3bt729817z8b
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
