Supervised Agentic Loop

v0.1.2

Self-improving AI agent loop with built-in misalignment detection. An AI agent autonomously runs Brainstorm → Plan → Implement → Review → Evolve cycles — kee...

0· 158·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nefas11/supervised-agentic-loop.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Supervised Agentic Loop" (nefas11/supervised-agentic-loop) from ClawHub.
Skill page: https://clawhub.ai/nefas11/supervised-agentic-loop
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: git, python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install supervised-agentic-loop

ClawHub CLI

Package manager switcher

npx clawhub@latest install supervised-agentic-loop
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the code and metadata: the repository implements an evolve loop that modifies a single target file, uses git isolation, reputation DB, monitoring, and optional Telegram alerts. Required binaries (git, python3) and the declared optional binaries are appropriate for the described functionality.
Instruction Scope
SKILL.md and the code limit modifications to a single target_file and persist state under .state and results.tsv. The loop runs user-supplied metric commands and accepts a user agent callable or local subprocess agent; this necessarily runs arbitrary commands and executes agent outputs (with verification gates). This scope is expected for an autonomous experiment loop, but it means the skill will run arbitrary metric commands and run an agent (possibly a local subprocess) which could perform any network or filesystem actions if configured to do so.
Install Mechanism
Install is a simple 'pip install -e .' via install.sh (no external downloads or opaque URLs). pyproject.toml lists no runtime dependencies, matching the README claim of stdlib-only. No high-risk download/extract operations are present in the manifest.
Credentials
No required environment variables are declared. Optional env vars (SAL_DB_PATH, MONITOR_TELEGRAM_BOT_TOKEN, MONITOR_TELEGRAM_CHAT_ID, MONITOR_LLM_COMMAND, MONITOR_STATE_DIR) are directly related to monitoring, Telegram alerts, or local review subprocess configuration and are justified by the monitor features described.
Persistence & Privilege
The skill persists experiment state to results.tsv and .state/* which is consistent with its purpose. It is not force-included (always: false). The skill can run autonomously (model invocation not disabled), which is typical for an agent skill — combine this with the fact it can run arbitrary metric commands and agent subprocesses when evaluating risk.
Assessment
This package appears coherent for its stated purpose, but it runs code and modifies your repository: 1) Run it only in an isolated working directory or disposable git clone (it creates branches and will reset/rollback). 2) Review/limit the metric_command you pass — it will be executed (it can do anything the shell allows). 3) The agent you provide can be a local subprocess (MONITOR_LLM_COMMAND or an AgentCallable); that subprocess may itself perform network I/O — treat it as untrusted unless you control it. 4) Optional Telegram alerts require you to provide a bot token/chat id; leave those unset to prevent outbound alerts. 5) Inspect git_isolation and verification gate code before use to confirm rollback semantics and which files are checked. If you want tighter isolation, run inside a container or VM, or add stricter verification/read-only rules before enabling autonomous runs.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🧬 Clawdis
Binsgit, python3
latestvk9763h40k17y0bm16j55wz0dys8388pz
158downloads
0stars
3versions
Updated 1mo ago
v0.1.2
MIT-0

supervised-agentic-loop

Self-improving AI agent loop with built-in misalignment detection.

Quick Reference

WhatDetails
LoopBrainstorm → Plan → Implement → Review → Verify → Evolve
Agent modifiesOne file only (target_file)
MetricAny command that produces a numeric output
Safety (SAL)Git isolation + reputation scoring + 4 verification gates
Safety (Monitor)SYNC blocking + ASYNC LLM review + 10 behavior patterns
Persistenceresults.tsv + .state/learnings/ + reputation.db + *.jsonl

Two Packages, One System

sal/                        # Evolve Loop — the brain
├── config.py               # Run configuration
├── evolve_loop.py          # 6-phase loop orchestrator
├── contract.py             # AgentCallable protocol
├── metric_extractor.py     # Named strategies + regex
├── verification.py         # 4 verification gates
├── reputation.py           # EMA scoring + suspension
├── git_isolation.py        # Branch per run, auto-rollback
├── learnings.py            # Persistent pattern detection
├── brainstorm.py           # Hypothesis generation
├── cli.py                  # CLI entrypoint
└── monitor/                # Agent Monitor — the guardian
    ├── sanitizer.py        # Credential redaction (10 patterns)
    ├── behaviors.py        # 10 misalignment behaviors (B001-B010)
    ├── monitor.py          # Two-phase detection engine
    ├── classifier.py       # Severity classification + dedup
    ├── logger.py           # JSONL tool-call logging
    ├── alerter.py          # Telegram alerts (urllib)
    ├── heartbeat.py        # Self-monitoring + canary
    └── dashboard.py        # Command Center data functions

Dependency rule: sal/ imports monitor/, NEVER the reverse. Monitor has zero knowledge of SAL core.

How to Use

As a Skill (in your agent instructions)

Read the SKILL.md in supervised-agentic-loop/ and begin an evolve run.
Target file: train.py
Metric: python train.py (look for val_bpb, lower is better)

As a CLI

# Evolve loop
sal run --target train.py --metric "python train.py" --parser val_bpb
sal status
sal unsuspend --agent codex --reason "verified by human"

# Monitor
sal monitor stats       # sessions, alerts, health
sal monitor alerts      # recent misalignment alerts
sal monitor canary      # run 5 self-test checks

As a Python API

from sal.config import EvolveConfig
from sal.evolve_loop import EvolveLoop

config = EvolveConfig(
    target_file="train.py",
    metric_command="python train.py",
    metric_parser="val_bpb",
    minimize=True,
)

def my_agent(prompt: str) -> str:
    # Your LLM call here — must return output with JSON block
    ...

# Monitor auto-enabled. Set enable_monitor=False to disable.
loop = EvolveLoop(config, agent=my_agent, agent_id="my-model")
summary = loop.run()

Monitor Standalone

from sal.monitor import AgentMonitor, BlockDecision

monitor = AgentMonitor(state_dir=".state")

# SYNC: before every tool call (< 5ms)
decision = monitor.check_before_execute("exec", {"command": "rm -rf /"})
# → BlockDecision.BLOCK

# ASYNC: review full session
result = monitor.review_session(session_entries, agent_callable=my_llm)

Phases

  1. Baseline — Run metric on unmodified code (fails → HARD ABORT)
  2. Brainstorm — Generate hypothesis from history + learnings
  3. Plan — Create TaskContract with acceptance criteria
  4. Implement — Agent modifies target_file (🛡️ Monitor: SYNC prefilter)
  5. Review — Parse agent output for contract compliance
  6. Verify — Run verification gates (files, syntax, tests, lint) + extract metric
  7. Evolve — Compare metric → KEEP (commit) or DISCARD (rollback)

10 Misalignment Behaviors

IDBehaviorSeveritySync Block
B001Restriction bypass / destructive deleteHIGH
B002Unexpected tool callsMEDIUM
B003Goal deviationMEDIUM
B004Self-modificationCRITICAL
B005Data exfiltrationCRITICAL
B006Hallucinated successHIGH
B007Privilege escalationHIGH
B008Resource exhaustionMEDIUM
B009Persistence / backdoorCRITICAL
B010ObfuscationHIGH

Auto-Brake Conditions

The loop stops automatically when:

  • Reputation ≤ 0.2 → Agent suspended
  • Monitor BLOCK → Iteration aborted + reputation penalty
  • Plateau → No improvement for N iterations
  • Budget → max_iterations reached
  • SIGINT → Human interrupt (graceful)

Built-in Metric Parsers

NameExtracts
last_line_floatFloat from last line of output
pytest_passedNumber of passed tests
pytest_failedNumber of failed tests
coverage_percentCoverage percentage
val_bpbValidation BPB value
benchmark_msMilliseconds from benchmark output
Custom regexAny regex with 1 capture group

Environment Variables

VariableDefaultDescription
SAL_DB_PATH.state/reputation.dbReputation database path
MONITOR_TELEGRAM_BOT_TOKENTelegram bot token for alerts
MONITOR_TELEGRAM_CHAT_IDTelegram chat/user ID
MONITOR_LLM_COMMANDLLM for async session review
MONITOR_STATE_DIR.stateMonitor state directory

Constraints

  • Zero external dependencies (Python 3.11+ stdlib only)
  • Agent modifies exactly ONE file per iteration
  • All changes are git-isolated with automatic rollback
  • Learnings persist across runs in .state/learnings/
  • Monitor is optional — SAL works without it
  • 130 tests (69 SAL + 61 Monitor)

Comments

Loading comments...