Supervised Agentic Loop

v0.1.2

Self-improving AI agent loop with built-in misalignment detection. An AI agent autonomously runs Brainstorm → Plan → Implement → Review → Evolve cycles — kee...

⭐ 0· 158·0 current·0 all-time

by@nefas11

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nefas11/supervised-agentic-loop.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Supervised Agentic Loop" (nefas11/supervised-agentic-loop) from ClawHub.
Skill page: https://clawhub.ai/nefas11/supervised-agentic-loop
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: git, python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install supervised-agentic-loop

ClawHub CLI

Package manager switcher

npx clawhub@latest install supervised-agentic-loop

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description match the code and metadata: the repository implements an evolve loop that modifies a single target file, uses git isolation, reputation DB, monitoring, and optional Telegram alerts. Required binaries (git, python3) and the declared optional binaries are appropriate for the described functionality.

ℹ

Instruction Scope

SKILL.md and the code limit modifications to a single target_file and persist state under .state and results.tsv. The loop runs user-supplied metric commands and accepts a user agent callable or local subprocess agent; this necessarily runs arbitrary commands and executes agent outputs (with verification gates). This scope is expected for an autonomous experiment loop, but it means the skill will run arbitrary metric commands and run an agent (possibly a local subprocess) which could perform any network or filesystem actions if configured to do so.

✓

Install Mechanism

Install is a simple 'pip install -e .' via install.sh (no external downloads or opaque URLs). pyproject.toml lists no runtime dependencies, matching the README claim of stdlib-only. No high-risk download/extract operations are present in the manifest.

✓

Credentials

No required environment variables are declared. Optional env vars (SAL_DB_PATH, MONITOR_TELEGRAM_BOT_TOKEN, MONITOR_TELEGRAM_CHAT_ID, MONITOR_LLM_COMMAND, MONITOR_STATE_DIR) are directly related to monitoring, Telegram alerts, or local review subprocess configuration and are justified by the monitor features described.

ℹ

Persistence & Privilege

The skill persists experiment state to results.tsv and .state/* which is consistent with its purpose. It is not force-included (always: false). The skill can run autonomously (model invocation not disabled), which is typical for an agent skill — combine this with the fact it can run arbitrary metric commands and agent subprocesses when evaluating risk.

Assessment

This package appears coherent for its stated purpose, but it runs code and modifies your repository: 1) Run it only in an isolated working directory or disposable git clone (it creates branches and will reset/rollback). 2) Review/limit the metric_command you pass — it will be executed (it can do anything the shell allows). 3) The agent you provide can be a local subprocess (MONITOR_LLM_COMMAND or an AgentCallable); that subprocess may itself perform network I/O — treat it as untrusted unless you control it. 4) Optional Telegram alerts require you to provide a bot token/chat id; leave those unset to prevent outbound alerts. 5) Inspect git_isolation and verification gate code before use to confirm rollback semantics and which files are checked. If you want tighter isolation, run inside a container or VM, or add stricter verification/read-only rules before enabling autonomous runs.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🧬 Clawdis

Binsgit, python3

latestvk9763h40k17y0bm16j55wz0dys8388pz

158downloads

0stars

3versions

Updated 1mo ago

v0.1.2

MIT-0

supervised-agentic-loop

Self-improving AI agent loop with built-in misalignment detection.

Quick Reference

What	Details
Loop	Brainstorm → Plan → Implement → Review → Verify → Evolve
Agent modifies	One file only (`target_file`)
Metric	Any command that produces a numeric output
Safety (SAL)	Git isolation + reputation scoring + 4 verification gates
Safety (Monitor)	SYNC blocking + ASYNC LLM review + 10 behavior patterns
Persistence	`results.tsv` + `.state/learnings/` + `reputation.db` + `*.jsonl`

Two Packages, One System

sal/                        # Evolve Loop — the brain
├── config.py               # Run configuration
├── evolve_loop.py          # 6-phase loop orchestrator
├── contract.py             # AgentCallable protocol
├── metric_extractor.py     # Named strategies + regex
├── verification.py         # 4 verification gates
├── reputation.py           # EMA scoring + suspension
├── git_isolation.py        # Branch per run, auto-rollback
├── learnings.py            # Persistent pattern detection
├── brainstorm.py           # Hypothesis generation
├── cli.py                  # CLI entrypoint
└── monitor/                # Agent Monitor — the guardian
    ├── sanitizer.py        # Credential redaction (10 patterns)
    ├── behaviors.py        # 10 misalignment behaviors (B001-B010)
    ├── monitor.py          # Two-phase detection engine
    ├── classifier.py       # Severity classification + dedup
    ├── logger.py           # JSONL tool-call logging
    ├── alerter.py          # Telegram alerts (urllib)
    ├── heartbeat.py        # Self-monitoring + canary
    └── dashboard.py        # Command Center data functions

Dependency rule: sal/ imports monitor/, NEVER the reverse. Monitor has zero knowledge of SAL core.

How to Use

As a Skill (in your agent instructions)

Read the SKILL.md in supervised-agentic-loop/ and begin an evolve run.
Target file: train.py
Metric: python train.py (look for val_bpb, lower is better)

As a CLI

# Evolve loop
sal run --target train.py --metric "python train.py" --parser val_bpb
sal status
sal unsuspend --agent codex --reason "verified by human"

# Monitor
sal monitor stats       # sessions, alerts, health
sal monitor alerts      # recent misalignment alerts
sal monitor canary      # run 5 self-test checks

As a Python API

from sal.config import EvolveConfig
from sal.evolve_loop import EvolveLoop

config = EvolveConfig(
    target_file="train.py",
    metric_command="python train.py",
    metric_parser="val_bpb",
    minimize=True,
)

def my_agent(prompt: str) -> str:
    # Your LLM call here — must return output with JSON block
    ...

# Monitor auto-enabled. Set enable_monitor=False to disable.
loop = EvolveLoop(config, agent=my_agent, agent_id="my-model")
summary = loop.run()

Monitor Standalone

from sal.monitor import AgentMonitor, BlockDecision

monitor = AgentMonitor(state_dir=".state")

# SYNC: before every tool call (< 5ms)
decision = monitor.check_before_execute("exec", {"command": "rm -rf /"})
# → BlockDecision.BLOCK

# ASYNC: review full session
result = monitor.review_session(session_entries, agent_callable=my_llm)

Phases

Baseline — Run metric on unmodified code (fails → HARD ABORT)
Brainstorm — Generate hypothesis from history + learnings
Plan — Create TaskContract with acceptance criteria
Implement — Agent modifies target_file (🛡️ Monitor: SYNC prefilter)
Review — Parse agent output for contract compliance
Verify — Run verification gates (files, syntax, tests, lint) + extract metric
Evolve — Compare metric → KEEP (commit) or DISCARD (rollback)

10 Misalignment Behaviors

ID	Behavior	Severity	Sync Block
B001	Restriction bypass / destructive delete	HIGH	✅
B002	Unexpected tool calls	MEDIUM	—
B003	Goal deviation	MEDIUM	—
B004	Self-modification	CRITICAL	—
B005	Data exfiltration	CRITICAL	✅
B006	Hallucinated success	HIGH	—
B007	Privilege escalation	HIGH	✅
B008	Resource exhaustion	MEDIUM	✅
B009	Persistence / backdoor	CRITICAL	✅
B010	Obfuscation	HIGH	✅

Auto-Brake Conditions

The loop stops automatically when:

Reputation ≤ 0.2 → Agent suspended
Monitor BLOCK → Iteration aborted + reputation penalty
Plateau → No improvement for N iterations
Budget → max_iterations reached
SIGINT → Human interrupt (graceful)

Built-in Metric Parsers

Name	Extracts
`last_line_float`	Float from last line of output
`pytest_passed`	Number of passed tests
`pytest_failed`	Number of failed tests
`coverage_percent`	Coverage percentage
`val_bpb`	Validation BPB value
`benchmark_ms`	Milliseconds from benchmark output
Custom regex	Any regex with 1 capture group

Environment Variables

Variable	Default	Description
`SAL_DB_PATH`	`.state/reputation.db`	Reputation database path
`MONITOR_TELEGRAM_BOT_TOKEN`	—	Telegram bot token for alerts
`MONITOR_TELEGRAM_CHAT_ID`	—	Telegram chat/user ID
`MONITOR_LLM_COMMAND`	—	LLM for async session review
`MONITOR_STATE_DIR`	`.state`	Monitor state directory

Constraints

Zero external dependencies (Python 3.11+ stdlib only)
Agent modifies exactly ONE file per iteration
All changes are git-isolated with automatic rollback
Learnings persist across runs in .state/learnings/
Monitor is optional — SAL works without it
130 tests (69 SAL + 61 Monitor)

Comments

Loading comments...