nanogpt-training

v0.1.0

Train GPT-2 scale models (~124M parameters) efficiently on a single GPU. Covers GPT-124M architecture, tokenized dataset loading (e.g., HuggingFace Hub shard...

⭐ 0· 76·0 current·0 all-time

by@lnj22

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lnj22/mhc-layer-impl-nanogpt-training.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "nanogpt-training" (lnj22/mhc-layer-impl-nanogpt-training) from ClawHub.
Skill page: https://clawhub.ai/lnj22/mhc-layer-impl-nanogpt-training
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mhc-layer-impl-nanogpt-training

ClawHub CLI

Package manager switcher

npx clawhub@latest install mhc-layer-impl-nanogpt-training

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description (nanogpt training) match the contents: model architecture, tokenized dataset loading, optimizers, and a training loop. Required tooling referenced in SKILL.md (torch, huggingface_hub, einops, numpy) is exactly what you'd expect for this task.

✓

Instruction Scope

Runtime instructions stay on-topic: they show how to download public HF token shards, build datasets via memmap, construct the model, and run mixed-precision training. There are no instructions to read unrelated system files, harvest environment variables, or call endpoints outside the expected external services (HuggingFace/GitHub).

✓

Install Mechanism

This is instruction-only (no install spec). The SKILL.md suggests pip installing common ML packages; that's appropriate and proportional. No archives or remote executables are fetched beyond public Python packages and dataset files from HuggingFace.

✓

Credentials

No environment variables, credentials, or config paths are required. The dataset downloads reference public repos (no auth). If you later point it at a private HF repo, HF credentials would be needed — but the skill itself does not request them.

✓

Persistence & Privilege

always is false and the skill does not request any special persistent privileges or modifications to other skills. Autonomous invocation is allowed (platform default) but not combined with problematic privileges.

Assessment

This skill is a coherent, textual training guide that appears safe to inspect and use. Before running: (1) review dataset licenses (downloading large token shards can have legal/ethical implications); (2) run initial experiments on a tiny subset to validate code and resource usage; (3) be aware of resource/cost implications when using GPU clouds (Modal examples request A100); (4) only provide HF/GitHub credentials if you intentionally access private repos; and (5) if you plan to execute code from untrusted sources, do so in isolated environments (containers) and inspect code snippets carefully for any modifications before running.

Like a lobster shell, security has layers — review code before you run it.

latestvk97fgpw8ap5jgqmv349mabmfph84tgr9

76downloads

0stars

1versions

Updated 1w ago

v0.1.0

MIT-0

NanoGPT Training

Overview

Training GPT-2 scale models (~124M parameters) efficiently on a single GPU. It provides:

GPT-124M Architecture: Standard transformer with RoPE and modern optimizations
Tokenized Datasets: Loading pre-tokenized shards from HuggingFace Hub or local files
Modern Optimizers: Muon optimizer with Newton-Schulz orthogonalization
Mixed Precision: bfloat16 training on A100 for 2x speedup

Training options:

Baseline GPT: Standard residual connections
Experimental residual variants: Optional alternative residual schemes for stability/efficiency

Quick Reference

Topic	Reference
Model Architecture	GPT Architecture
Data Loading	Tokenized Data
Optimizers	Optimizers
Training Loop	Training Loop
Hyperparameters	Hyperparameters

Installation

pip install torch einops numpy huggingface_hub

Minimal Example

import modal

app = modal.App("gpt-training")

image = modal.Image.debian_slim(python_version="3.11").pip_install(
    "torch", "einops", "numpy", "huggingface_hub"
)

@app.function(gpu="A100", image=image, timeout=3600)
def train():
    import torch
    from dataclasses import dataclass

    @dataclass
    class GPTConfig:
        block_size: int = 1024
        vocab_size: int = 50257
        n_layer: int = 12
        n_head: int = 12
        n_embd: int = 768
        dropout: float = 0.0
        bias: bool = False

    # Download data, build model, train
    # ... (see references for full implementation)

    return {"final_loss": final_loss}

@app.local_entrypoint()
def main():
    results = train.remote()
    print(results)

Common Imports

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.cuda.amp import autocast, GradScaler
from dataclasses import dataclass
from einops import rearrange, repeat, reduce
import numpy as np
import math

When to Use What

Scenario	Approach
Standard GPT training	Use baseline model with standard residuals
Stability experiments	Try alternative residual variants or extra streams
Small experiments	Use T4/A10G GPU
Full training	Use A100 with bfloat16
Custom data	Modify the dataset loader class
Different model size	Adjust GPTConfig parameters

Metrics to Monitor

Metric	Typical Signal	Notes
Validation loss	Steady decrease	Absolute value depends on dataset/tokenizer
Grad norm	Moderate, stable range	Large spikes indicate instability
Training stability	Smooth curves	Frequent spikes suggest LR/batch issues
Throughput	Consistent tokens/sec	Use for comparing configs

External Resources

nanoGPT: https://github.com/karpathy/nanoGPT
build-nanogpt: https://github.com/karpathy/build-nanogpt
modded-nanogpt: https://github.com/KellerJordan/modded-nanogpt
FineWeb-Edu token shards: https://huggingface.co/datasets/karpathy/fineweb-edu-100B-gpt2-token-shards

Comments

Loading comments...