Weights & Biases Monitor

v1.0.0

Monitor and analyze Weights & Biases training runs. Use when checking training status, detecting failures, analyzing loss curves, comparing runs, or monitoring experiments. Triggers on "wandb", "training runs", "how's training", "did my run finish", "any failures", "check experiments", "loss curve", "gradient norm", "compare runs".

⭐ 1· 1.8k·0 current·0 all-time

by@chrisvoncsefalvay

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name/description match the contained scripts: all scripts use the Weights & Biases Python API to list runs, fetch history, compare metrics, and print health reports. Asking for access to W&B data is consistent with the stated purpose.

Instruction Scope

SKILL.md and scripts instruct running Python from a hard-coded venv path (~/clawd/venv/bin/python3) and reference setting WANDB_API_KEY or running 'wandb login'. The skill metadata declares no required environment variables or binaries, so the runtime instructions access credentials/environment not reflected in the manifest. The SKILL.md makes absolute-path assumptions which may not exist on a host; watch_runs.py also contains a hard-coded default entity and projects list (author-specific defaults) which could produce surprising queries.

ℹ

Install Mechanism

This is instruction-only (no install spec), which is lower-risk for unexpected downloads. However, the scripts rely on the wandb Python package and an available Python virtualenv but the skill does not declare those dependencies or provide an install step — the agent/user must ensure wandb is installed. No external download or obscure URLs are present.

Credentials

The SKILL.md tells users to run 'wandb login' or set WANDB_API_KEY, but the skill's requires.env and primary credential fields are empty. Requesting the W&B API key is reasonable for this tool's function, but the lack of declaration is an inconsistency that could lead to accidental credential exposure or confusion about what secrets are needed.

✓

Persistence & Privilege

The skill does not request permanent presence (always:false) and does not attempt to modify other skills or system-wide agent settings. It only uses the wandb API at runtime and prints reports; no privileged persistence is requested.

What to consider before installing

This skill appears to implement exactly what it claims (W&B run monitoring) but there are mismatches you should address before use: 1) The scripts require the wandb Python package and a Python interpreter/virtualenv but the skill manifest doesn't declare this — install wandb in an isolated venv before running. 2) Provide a WANDB_API_KEY (or run 'wandb login') when running headless; treat that key like any secret (store in a secure vault or environment only for the session). 3) Edit the SKILL.md or invocation commands to remove the hard-coded path (~/clawd/venv/...) or run the scripts with your own Python path to avoid surprises. 4) Note the default entity/projects in watch_runs.py are author-specific — change them to your org or pass explicit arguments to avoid querying someone else’s projects. 5) The code excerpts in the distribution you provided are truncated in places; to raise confidence, inspect the full files for any network calls outside wandb.Api (e.g., requests, socket) or obfuscated code. If you want, provide the complete untruncated files and I can re-check for any hidden endpoints or questionable behavior. Running these scripts in a disposable environment (isolated container or dedicated VM) first is recommended so you can verify behavior and confirm no unexpected data exfiltration occurs.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cezs1ykj3gmrsq2nft2k0mn7zz1yy

1.8kdownloads

1stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Weights & Biases

Monitor, analyze, and compare W&B training runs.

Setup

wandb login
# Or set WANDB_API_KEY in environment

Scripts

Characterize a Run (Full Health Analysis)

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/characterize_run.py ENTITY/PROJECT/RUN_ID

Analyzes:

Loss curve trend (start → current, % change, direction)
Gradient norm health (exploding/vanishing detection)
Eval metrics (if present)
Stall detection (heartbeat age)
Progress & ETA estimate
Config highlights
Overall health verdict

Options: --json for machine-readable output.

Watch All Running Jobs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/watch_runs.py ENTITY [--projects p1,p2]

Quick health summary of all running jobs plus recent failures/completions. Ideal for morning briefings.

Options:

--projects p1,p2 — Specific projects to check
--all-projects — Check all projects
--hours N — Hours to look back for finished runs (default: 24)
--json — Machine-readable output

Compare Two Runs

~/clawd/venv/bin/python3 ~/clawd/skills/wandb/scripts/compare_runs.py ENTITY/PROJECT/RUN_A ENTITY/PROJECT/RUN_B

Side-by-side comparison:

Config differences (highlights important params)
Loss curves at same steps
Gradient norm comparison
Eval metrics
Performance (tokens/sec, steps/hour)
Winner verdict

Python API Quick Reference

import wandb
api = wandb.Api()

# Get runs
runs = api.runs("entity/project", {"state": "running"})

# Run properties
run.state      # running | finished | failed | crashed | canceled
run.name       # display name
run.id         # unique identifier
run.summary    # final/current metrics
run.config     # hyperparameters
run.heartbeat_at # stall detection

# Get history
history = list(run.scan_history(keys=["train/loss", "train/grad_norm"]))

Metric Key Variations

Scripts handle these automatically:

Loss: train/loss, loss, train_loss, training_loss
Gradients: train/grad_norm, grad_norm, gradient_norm
Steps: train/global_step, global_step, step, _step
Eval: eval/loss, eval_loss, eval/accuracy, eval_acc

Health Thresholds

Gradients > 10: Exploding (critical)
Gradients > 5: Spiky (warning)
Gradients < 0.0001: Vanishing (warning)
Heartbeat > 30min: Stalled (critical)
Heartbeat > 10min: Slow (warning)

Integration Notes

For morning briefings, use watch_runs.py --json and parse the output.

For detailed analysis of a specific run, use characterize_run.py.

For A/B testing or hyperparameter comparisons, use compare_runs.py.

Comments

Loading comments...