Karpathy Autoresearch
Security checks across static analysis, malware telemetry, and agentic risk
Overview
This skill is a disclosed self-improvement tool, but it gives the agent broad authority to edit and persistently change skills/files and to run evaluator commands, so it needs careful review and sandboxing.
Install only if you are comfortable letting the agent edit and commit files. Run it in a clean copy or branch, specify the exact mutable file and trusted eval command, avoid repos with secrets, review every diff before keeping changes, and do not let it mutate live skills without safety tests and human approval.
Static analysis
No static analysis findings were reported for this release.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The agent could make many changes to a skill, prompt, strategy, or config before the user reviews the final result.
The skill instructs the agent to repeatedly edit files, run evaluations, and keep changes automatically. This is central to the skill, but it is broad mutation authority without explicit per-change approval or sandbox requirements.
- Make the change ... - Run eval ... - Keep or revert based on score ... Continue for N experiments (default: 20, or until user stops)
Run only on a copy or dedicated branch, specify the exact mutable file and experiment count, and require human review of diffs before accepting changes.
A bad or untrusted eval command could run arbitrary local commands in the project environment.
The reference implementation runs the evaluator as a shell command. This is expected for an evaluation harness, but it means the eval command must be treated as trusted code.
subprocess.run(
eval_cmd,
shell=True,
capture_output=True,
text=True,
timeout=300,Only use evaluator commands you wrote or trust; avoid copying eval commands from untrusted sources, and prefer safer argument-based execution over shell=True where possible.
Unrelated work, local config, or accidental sensitive files could be committed into git history and later propagated if the repo is pushed or shared.
The loop stages all changes in the working directory, not just the mutable file. That can capture unrelated files or generated artifacts into experiment commits.
subprocess.run(["git", "add", "-A"], cwd=workdir, capture_output=True)
Use a clean sandbox repo or branch, ensure secrets are ignored, and change the implementation to stage only the intended mutable file and log files.
A skill may become optimized for a narrow score while losing safety, correctness, or user-intent constraints in future runs.
The skill explicitly supports mutating SKILL.md prompt instructions, which are persistent agent instructions reused in future tasks. A weak, biased, or poisoned eval could cause unsafe prompt changes to be kept.
The **mutable file** is the thing you're optimizing. It can be: - A SKILL.md prompt/instructions
Keep immutable safety requirements outside the mutation target, add safety/regression tests to the eval suite, and manually review any SKILL.md changes before installing or publishing them.
Private prompts, examples, or business data used in evals could leave the local environment depending on the chosen LLM provider.
The documentation suggests an optional LLM-judge evaluation flow. If users adopt it, prompts, outputs, or test cases may be sent to an external model provider.
**LLM-as-judge**: Send output to an LLM, ask it to score 1-100
Use a trusted or local evaluator for sensitive material, and clearly separate data that may be sent to external model providers.
