Install
openclaw skills install expflow-pipeline-hpoPDEBench competition workflow orchestration with expflow — three pipeline modes (full/fast/skip), distributed HPO, pruner integration, and ClearML HyperParameterOptimizer native mode.
openclaw skills install expflow-pipeline-hpoOrchestrate experiment workflows for the AI4S PDE competition using expflow. Three modes for three competition phases.
pip install "expflow-pde[pipeline]"
Three pipeline modes, each mapped to a CLI command:
For the exploration phase of a competition task. Optuna finds best params via distributed clearml-agent trials, trains with best, then evaluates.
expflow pipeline submit-full train_task1.py \
--queue default \
--trials 50 --parallel 4 \
--eval-script eval_task1.py \
--metric seg_total --direction maximize
Flags used:
--trials N: total HPO trials--parallel M: max concurrent trials (use GPU node count)--metric: objective metric name prefixed METRIC: in script stdout--pruner hyperband|median|percentile: early-stop poor trials--study-name: Optuna study name (auto if omitted; persists to SQLite)--skip hpo --skip eval: run train only within full skeletonFor the competition sprint phase. You already know best params. Skip HPO, run directly with fixed args.
expflow pipeline submit train_task1.py \
--queue default \
--train-param lr=0.001 --train-param epochs=80 \
--eval-script eval_task1.py \
--eval-param sub_step=5
Flags:
--skip eval: train-only (just submit checkpoint)--train-param key=val: injected as --key=val to training script--eval-param key=val: injected as --key=val to eval scriptOverride step inclusion on either mode:
expflow pipeline submit-full train_task1.py \
--skip hpo --skip eval # = train only
expflow pipeline submit-full train_task1.py \
--skip train --skip eval # = HPO only
HPO (expflow optuna run) has three backends:
| Mode | Flag | Description | Best for |
|---|---|---|---|
| Local | (default) | subprocess serial on CPU | ≤20 trials, quick test |
| Distributed | --distributed | ask/tell + clearml Task clone | Multi-GPU, custom control |
| Optimizer | --optimizer -O | Clearml HyperParameterOptimizer | Production, 50-200+ trials |
--pruner hyperband|median|percentile|none: ASHA pruner saves ~40% GPU time--metric <name>: reads METRIC:<name>=<value> from script stdout--direction maximize|minimize--timeout <min>: safety cutoffThe training/eval script must:
--key=value CLI argumentsMETRIC:<name>=<value> to stdout for objective capture (local mode)Task.current_task().report_scalar("Score", "seg_total", value, iteration=epoch)
trial.report() calls during training. If the script only reports at the end, the pruner has nothing to prune on. Call trial.report(val_loss, epoch) at least every 10 epochs.Title/Series format. If your metric is seg_total, it becomes title=seg_total, series=seg_total. If your clearml report_scalar is report_scalar("Score", "seg_total", v), pass --metric Score/seg_total.expflow clearml workers or check Web UI._collect_one_trial polls every 5s — waits up to 60min per trial. If trials are expected to run longer, increase timeout_minutes.Key files in expflow_pde/:
hpo.py — 3-mode HPO runner (local/distributed/optimizer)pipeline.py — ExperimentPipeline class (fast/full modes)cli_pipeline.py — pipeline submit + pipeline submit-fullcli_optuna.py — optuna run with all three backendsexperiment-lifecycle-governance — PIN, metrics registry, compare-scores, competition rules auditpde-experiment-hyperparameters — PDEBench-specific hyperparameter referencemulti-agent-distributed-experiment-workflow — Hermes → OpenCode → clearml