Install
openclaw skills install autoresearch-karpathyAutonomous AI research skill for running automated neural network experiments. This skill should be used when the user wants to set up autonomous AI research experiments, run automated neural network training, conduct autonomous machine learning research, or let AI agents experiment with model architectures and hyperparameters. Based on Andrej Karpathy's autoresearch project, this skill enables AI agents to autonomously modify training code, run experiments, evaluate results, and iteratively improve models. Use when: (1) Setting up autonomous research experiments, (2) Running automated neural network training, (3) Conducting AI-driven research optimization, (4) Experimenting with model architectures and hyperparameters, (5) Implementing autonomous research loops, or (6) When the user mentions "autonomous research", "AI experiments", "automated training", "neural network optimization", or "autoresearch".
openclaw skills install autoresearch-karpathyThis skill enables autonomous AI research experiments based on Andrej Karpathy's autoresearch project. It allows AI agents to autonomously modify neural network training code, run experiments, evaluate results, and iteratively improve models.
The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously. The agent modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You can leave it running overnight and wake up to a log of experiments and (hopefully) a better model.
The project has three core files:
prepare.py — Fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.train.py — The single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. This file is edited and iterated on by the agent.program.md — Baseline instructions for the agent. This file is edited and iterated on by the human.Clone the repository (if not already done):
git clone https://github.com/karpathy/autoresearch.git
cd autoresearch
Install dependencies:
uv sync
Prepare data (one-time setup):
uv run prepare.py
mar20)git checkout -b autoresearch/<tag>
echo -e "commit\tval_bpb\tmemory_gb\tstatus\tdescription" > results.tsv
The agent follows this loop indefinitely:
LOOP FOREVER:
1. Look at current git state
2. Modify train.py with experimental idea
3. git commit
4. Run experiment: uv run train.py > run.log 2>&1
5. Extract results: grep "^val_bpb:\|^peak_vram_mb:" run.log
6. If crash → analyze logs and fix or mark as crash
7. Record results in results.tsv
8. If improved → keep commit
9. If not improved → git reset
keep, discard, or crashtrain.py (architecture, optimizer, hyperparameters, training loop, etc.)prepare.py (read-only)Each experiment produces a summary:
---
val_bpb: 0.997900
training_seconds: 300.1
total_seconds: 325.9
peak_vram_mb: 45060.2
mfu_percent: 39.80
total_tokens_M: 499.6
num_steps: 953
num_params_M: 50.3
depth: 8
Results are logged to results.tsv (tab-separated):
commit val_bpb memory_gb status description
a1b2c3d 0.997900 44.0 keep baseline
b2c3d4e 0.993200 44.2 keep increase LR to 0.04
c3d4e5f 1.005000 44.0 discard switch to GeLU activation
d4e5f6g 0.000000 0.0 crash double model width (OOM)
CRITICAL: Once the experiment loop begins, the agent operates autonomously:
uv run prepare.pycrash statustail -n 50 run.logThis skill can be combined with the agent-teams-playbook skill for: