Install
openclaw skills install experiment-lifecycle-governanceAdd governance to experiment workflows — PIN-protected destructive ops, standardized metrics registry with thresholds, compare-scores ranking with gating, and competition rules audit. Builds on clearml-agent-dispatch and fysom-fsm-integration.
openclaw skills install experiment-lifecycle-governanceGovernance layer for experiment workflows: protect destructive operations, standardize metrics, rank experiments with gating, and audit against competition rules.
Three sub-systems:
pip install expflow-pde
~/.expflow/pin.hash # SHA-256 hash of 4-digit PIN (never plaintext)
~/.expflow/experiments.jsonl # Experiment registry (each line = JSON record)
# pin.py — 4 components:
# 1. init_pin(pin: str) -> hash # Validate + hash + write
# 2. verify_pin(pin: str) -> bool # Hash comparison
# 3. pin_is_set() -> bool # Check if PIN configured
# 4. guard(action_description) -> bool # Interactive prompt
# sha256 hash — never store raw PIN
def _hash_pin(pin: str) -> str:
return hashlib.sha256(pin.encode()).hexdigest()
# Validate exactly 4 digits
def _validate_pin(pin: str) -> None:
if not pin.isdigit() or len(pin) != 4:
raise ValueError("PIN must be exactly 4 digits (0-9)")
expflow pin init 1234 # Set PIN (SHA-256 stored)
expflow pin check # Interactive verify
expflow pin clear [--force] # Remove PIN
expflow pin status # Show if active
# Guarded commands (require PIN unless --force):
expflow run cancel <id> # Interactive PIN prompt
expflow run cancel <id> --force # Skip PIN
STANDARD_METRICS = {
"seg_total": {
"type": "scalar", "group": "Score",
"higher_is_better": True,
"description": "Total segment score (primary competition metric)",
},
"pde_mean": {
"type": "scalar", "group": "PDE",
"higher_is_better": False,
"threshold": 18.09, # Competition gate
},
"train_time_min": {
"type": "scalar", "group": "Time",
"higher_is_better": False,
"threshold": 60, # Competition limit
},
# ... 13 total metrics across Score/Loss/PDE/Time/Model/Training groups
}
def report_standard(task: Any | None = None, **kwargs: float) -> dict[str, float]:
reported = {}
for name, value in kwargs.items():
info = STANDARD_METRICS.get(name)
if info is None:
raise ValueError(f"Unknown metric '{name}'...")
reported[name] = float(value)
if task is not None:
task.report_scalar(title=info["group"], series=name, value=float(value), iteration=0)
return reported
expflow clearml compare-scores \
--project PDEBench --tags task1 \
--sort-by pde_mean --ascending \
--gate pde_mean:lt:18.09 --gate train_time_min:lt:60
Gates use metric:op:value triplets:
pde_mean:lt:18.09 — PDE mean < 18.09train_time_min:le:60 — Training time ≤ 60 minseg_total:ge:50 — Score ≥ 50Operators: lt, le, gt, ge.
expflow audit validate exp-001 --competition-rules --task-id abc123
from expflow_pde.audit import validate_competition_rules
result = validate_competition_rules(
task_metrics={"seg_total": 57.09, "pde_mean": 15.0, "train_time_min": 45.5},
task_params={"Args/--sub_step": "5"},
)
print(f"All pass: {result['all_pass']}")
| Check | Condition | Details |
|---|---|---|
seg_total | Primary competition score (no gating) | Reported, not gated |
pde_mean | Must be < 18.09 | Threshold from STANDARD_METRICS |
train_time_min | Must be < 60 | Threshold from STANDARD_METRICS |
sub_step parameter | Must exist and be > 0 | Searches case-insensitive |
PIN hash must NOT go into config.yaml (risk of git commit). Use ~/.expflow/pin.hash.
Precedence: pin.hash file > .env EXPFLOW_PIN_HASH > config.yaml pin.hash.
get_last_scalar_metrics() clearml APIReturns nested dict: {"Score": {"seg_total": {"last": 57.09, ...}}, ...}. Flatten to {"seg_total": 57.09} for compare_scores.
--force Flag for Script CallsAlways provide --force / -f on PIN-guarded commands for CI/automation.
getpass vs Non-Interactivegetpass.getpass() works in terminals but fails in piped commands, CI, or subagent calls. Always provide --pin or --force as alternative paths.