Karpathy Autoresearch

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This skill is a disclosed self-improvement tool, but it gives the agent broad authority to edit and persistently change skills/files and to run evaluator commands, so it needs careful review and sandboxing.

Install only if you are comfortable letting the agent edit and commit files. Run it in a clean copy or branch, specify the exact mutable file and trusted eval command, avoid repos with secrets, review every diff before keeping changes, and do not let it mutate live skills without safety tests and human approval.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The agent could make many changes to a skill, prompt, strategy, or config before the user reviews the final result.

Why it was flagged

The skill instructs the agent to repeatedly edit files, run evaluations, and keep changes automatically. This is central to the skill, but it is broad mutation authority without explicit per-change approval or sandbox requirements.

Skill content
- Make the change ... - Run eval ... - Keep or revert based on score ... Continue for N experiments (default: 20, or until user stops)
Recommendation

Run only on a copy or dedicated branch, specify the exact mutable file and experiment count, and require human review of diffs before accepting changes.

What this means

A bad or untrusted eval command could run arbitrary local commands in the project environment.

Why it was flagged

The reference implementation runs the evaluator as a shell command. This is expected for an evaluation harness, but it means the eval command must be treated as trusted code.

Skill content
subprocess.run(
            eval_cmd,
            shell=True,
            capture_output=True,
            text=True,
            timeout=300,
Recommendation

Only use evaluator commands you wrote or trust; avoid copying eval commands from untrusted sources, and prefer safer argument-based execution over shell=True where possible.

What this means

Unrelated work, local config, or accidental sensitive files could be committed into git history and later propagated if the repo is pushed or shared.

Why it was flagged

The loop stages all changes in the working directory, not just the mutable file. That can capture unrelated files or generated artifacts into experiment commits.

Skill content
subprocess.run(["git", "add", "-A"], cwd=workdir, capture_output=True)
Recommendation

Use a clean sandbox repo or branch, ensure secrets are ignored, and change the implementation to stage only the intended mutable file and log files.

What this means

A skill may become optimized for a narrow score while losing safety, correctness, or user-intent constraints in future runs.

Why it was flagged

The skill explicitly supports mutating SKILL.md prompt instructions, which are persistent agent instructions reused in future tasks. A weak, biased, or poisoned eval could cause unsafe prompt changes to be kept.

Skill content
The **mutable file** is the thing you're optimizing. It can be:
- A SKILL.md prompt/instructions
Recommendation

Keep immutable safety requirements outside the mutation target, add safety/regression tests to the eval suite, and manually review any SKILL.md changes before installing or publishing them.

What this means

Private prompts, examples, or business data used in evals could leave the local environment depending on the chosen LLM provider.

Why it was flagged

The documentation suggests an optional LLM-judge evaluation flow. If users adopt it, prompts, outputs, or test cases may be sent to an external model provider.

Skill content
**LLM-as-judge**: Send output to an LLM, ask it to score 1-100
Recommendation

Use a trusted or local evaluator for sensitive material, and clearly separate data that may be sent to external model providers.