Autoresearch Pilot

v1.0.0

Guide for setting up and running Karpathy's autoresearch — autonomous AI-driven LLM training experiments. Helps write program.md, interpret results, and opti...

0· 6·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the instructions: the skill is a textual guide for setting up and running autoresearch. It does not request unrelated credentials, binaries, or config paths, so the capability footprint is proportionate to the stated purpose.
Instruction Scope
SKILL.md gives step-by-step guidance (clone repo, run commands locally, edit program.md) and explicitly says it will not exec or modify files. It does instruct the user/agent to run commands locally, but does not direct reading of unrelated system files or exfiltration of data.
Install Mechanism
No install spec and no code files — the skill is instruction-only, which minimizes risk from installation or on-disk code.
Credentials
The skill declares no required environment variables or credentials. It sensibly lists local prerequisites (Python, GPU) in prose only — there are no disproportionate secret or config requests.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request persistent privileges or modify other skills or system-wide settings.
Assessment
This skill is a textual co‑pilot and does not install or run code by itself, which is good. Before following its instructions: (1) verify the GitHub repository URL and review the repo code (especially scripts like prepare.py/train.py) before running them; (2) confirm what the 'uv' package manager is and inspect any packages it installs; (3) be aware that running training jobs can consume significant GPU/time and may use or generate datasets you should check for licensing/privacy; (4) do not grant the agent remote execution rights or secrets — let it propose changes and run commands only when you explicitly approve and understand them. Overall the skill is coherent and advisory, but exercise normal caution when cloning/running third‑party training code.

Like a lobster shell, security has layers — review code before you run it.

latestvk970pbwv510xhkg90sqxfasswn844z5n

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Autoresearch Pilot v1.0

Install: clawhub install autoresearch-pilot

Your co-pilot for Karpathy's autoresearch — autonomous AI-driven LLM training experiments on a single GPU.

Language

Detect from user's message language. Default: English.

How It Works

Autoresearch lets an AI agent modify train.py, run 5-minute experiments, check if val_bpb improved, and iterate. This skill helps you set it up, write optimal program.md, and interpret results.

The Three Files

FileRoleModified by
prepare.pyData prep, tokenizer, utilitiesNever (fixed)
train.pyModel, optimizer, training loopThe AI agent
program.mdInstructions for the AI agentYou (the human)

Key Concepts

  • val_bpb — Validation bits per byte. Lower = better. Vocab-size-independent metric.
  • Time budget — Each experiment runs exactly 5 minutes (wall clock). ~100 experiments per night.
  • Muon optimizer — Included. Often outperforms AdamW for small models.
  • DEPTH — Primary model complexity knob (default 8). Lower for smaller GPUs.

Setup Guide

Walk the user through these steps when they want to start:

  1. Prerequisites: Python 3.10+, NVIDIA GPU (H100 recommended), uv package manager
  2. Clone repo: git clone https://github.com/karpathy/autoresearch
  3. Install: uv sync inside the repo
  4. Prepare data: uv run prepare.py (one-time, ~2 min)
  5. Test run: uv run train.py (should complete in ~5 min)
  6. Point your AI agent at program.md and let it experiment

Small GPU Tips (RTX 3090, Macbook, etc.)

When the user has a smaller GPU, suggest these prepare.py changes:

  • Use TinyStories dataset (lower entropy, works with small models)
  • Lower vocab_size to 4096 or 2048 (or 256 for byte-level)
  • Lower MAX_SEQ_LEN to 256
  • Lower DEPTH to 4 in train.py
  • Use WINDOW_PATTERN of "L" only
  • Lower TOTAL_BATCH_SIZE to 2**14

Writing program.md

When the user asks for help with program.md, help them define:

  1. Research goal — What to optimize for (speed, quality, efficiency)
  2. Experiment strategy — What to try first, what to vary
  3. Success criteria — Target val_bpb or improvement threshold
  4. Safety guardrails — What the agent should NOT change

Example structure for program.md:

  • State the goal clearly
  • List allowed modifications (architecture, hyperparams, optimizer)
  • Define experiment logging format
  • Set a stopping condition (e.g., "stop after 50 experiments with no improvement")

Interpreting Results

When the user shares experiment logs:

MetricGoodBad
val_bpb decreasingModel is learningCheck for bugs
val_bpb plateauedMay need architecture changeNormal for small models
Training loss << val lossOverfittingIncrease regularization
NaN lossLearning rate too high or instabilityLower LR, check gradients

Quick Commands

User saysAction
"set up autoresearch"Walk through setup steps
"help me write program.md"Draft research instructions
"my val_bpb is X"Evaluate and suggest next steps
"optimize for small GPU"Suggest parameter changes
"what should I try next"Analyze recent experiments, propose new direction

Guidelines for Agent

  1. Read-only guidance — suggest changes, let the user apply them
  2. Check GPU capability — ask what GPU they have before recommending parameters
  3. Start simple — recommend TinyStories + DEPTH 4 for first-time users
  4. Explain val_bpb — many users are new to this metric
  5. Refer to autoresearch repo — it's the source of truth for all defaults
  6. No exec — guide only, never run training commands

What This Skill Does NOT Do

  • Does NOT run training commands or experiments
  • Does NOT modify train.py or prepare.py directly
  • Does NOT require an NVIDIA GPU (guidance works for any platform)
  • Does NOT access credentials or private data
  • Does NOT write any files — pure advisory

More by TommoT2

  • setup-doctor — Diagnose and fix OpenClaw setup issues
  • context-brief — Persistent context survival across sessions
  • model-pilot — Intelligent model routing and cost optimization

Install the full suite:

clawhub install autoresearch-pilot setup-doctor context-brief model-pilot

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…