Install
openclaw skills install supervised-agentic-loopSelf-improving AI agent loop with built-in misalignment detection. An AI agent autonomously runs Brainstorm → Plan → Implement → Review → Evolve cycles — keeping improvements, discarding regressions, and learning persistently. Includes a synchronous rule-based prefilter that blocks destructive commands before execution, and an optional async LLM review for subtle misalignment. All operations are LOCAL by default — network access is opt-in via env vars.
openclaw skills install supervised-agentic-loopSelf-improving AI agent loop with built-in misalignment detection.
| What | Details |
|---|---|
| Loop | Brainstorm → Plan → Implement → Review → Verify → Evolve |
| Agent modifies | One file only (target_file) |
| Metric | Any command that produces a numeric output |
| Safety (SAL) | Git isolation + reputation scoring + 4 verification gates |
| Safety (Monitor) | SYNC blocking + ASYNC LLM review + 10 behavior patterns |
| Persistence | results.tsv + .state/learnings/ + reputation.db + *.jsonl |
sal/ # Evolve Loop — the brain
├── config.py # Run configuration
├── evolve_loop.py # 6-phase loop orchestrator
├── contract.py # AgentCallable protocol
├── metric_extractor.py # Named strategies + regex
├── verification.py # 4 verification gates
├── reputation.py # EMA scoring + suspension
├── git_isolation.py # Branch per run, auto-rollback
├── learnings.py # Persistent pattern detection
├── brainstorm.py # Hypothesis generation
├── cli.py # CLI entrypoint
└── monitor/ # Agent Monitor — the guardian
├── sanitizer.py # Credential redaction (10 patterns)
├── behaviors.py # 10 misalignment behaviors (B001-B010)
├── monitor.py # Two-phase detection engine
├── classifier.py # Severity classification + dedup
├── logger.py # JSONL tool-call logging
├── alerter.py # Telegram alerts (urllib)
├── heartbeat.py # Self-monitoring + canary
└── dashboard.py # Command Center data functions
Dependency rule: sal/ imports monitor/, NEVER the reverse. Monitor has zero knowledge of SAL core.
Read the SKILL.md in supervised-agentic-loop/ and begin an evolve run.
Target file: train.py
Metric: python train.py (look for val_bpb, lower is better)
# Evolve loop
sal run --target train.py --metric "python train.py" --parser val_bpb
sal status
sal unsuspend --agent codex --reason "verified by human"
# Monitor
sal monitor stats # sessions, alerts, health
sal monitor alerts # recent misalignment alerts
sal monitor canary # run 5 self-test checks
from sal.config import EvolveConfig
from sal.evolve_loop import EvolveLoop
config = EvolveConfig(
target_file="train.py",
metric_command="python train.py",
metric_parser="val_bpb",
minimize=True,
)
def my_agent(prompt: str) -> str:
# Your LLM call here — must return output with JSON block
...
# Monitor auto-enabled. Set enable_monitor=False to disable.
loop = EvolveLoop(config, agent=my_agent, agent_id="my-model")
summary = loop.run()
from sal.monitor import AgentMonitor, BlockDecision
monitor = AgentMonitor(state_dir=".state")
# SYNC: before every tool call (< 5ms)
decision = monitor.check_before_execute("exec", {"command": "rm -rf /"})
# → BlockDecision.BLOCK
# ASYNC: review full session
result = monitor.review_session(session_entries, agent_callable=my_llm)
target_file (🛡️ Monitor: SYNC prefilter)| ID | Behavior | Severity | Sync Block |
|---|---|---|---|
| B001 | Restriction bypass / destructive delete | HIGH | ✅ |
| B002 | Unexpected tool calls | MEDIUM | — |
| B003 | Goal deviation | MEDIUM | — |
| B004 | Self-modification | CRITICAL | — |
| B005 | Data exfiltration | CRITICAL | ✅ |
| B006 | Hallucinated success | HIGH | — |
| B007 | Privilege escalation | HIGH | ✅ |
| B008 | Resource exhaustion | MEDIUM | ✅ |
| B009 | Persistence / backdoor | CRITICAL | ✅ |
| B010 | Obfuscation | HIGH | ✅ |
The loop stops automatically when:
| Name | Extracts |
|---|---|
last_line_float | Float from last line of output |
pytest_passed | Number of passed tests |
pytest_failed | Number of failed tests |
coverage_percent | Coverage percentage |
val_bpb | Validation BPB value |
benchmark_ms | Milliseconds from benchmark output |
| Custom regex | Any regex with 1 capture group |
| Variable | Default | Description |
|---|---|---|
SAL_DB_PATH | .state/reputation.db | Reputation database path |
MONITOR_TELEGRAM_BOT_TOKEN | — | Telegram bot token for alerts |
MONITOR_TELEGRAM_CHAT_ID | — | Telegram chat/user ID |
MONITOR_LLM_COMMAND | — | LLM for async session review |
MONITOR_STATE_DIR | .state | Monitor state directory |
.state/learnings/