Ship Loop

v5.0.0

Run a chained build→ship→verify→notify pipeline for multi-segment feature work. Use when implementing multiple features in sequence, each as a coding agent t...

⭐ 0· 143·0 current·0 all-time

byFernando@fernando-fernandez3

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fernando-fernandez3/ship-loop.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Ship Loop" (fernando-fernandez3/ship-loop) from ClawHub.
Skill page: https://clawhub.ai/fernando-fernandez3/ship-loop
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: git, python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ship-loop

ClawHub CLI

Package manager switcher

npx clawhub@latest install ship-loop

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (Ship Loop) match the provided code, CLI, and docs. Required binaries (git, python3), the SQLite state backend, provider plugins (vercel/netlify/custom), and agent CLI integration are all consistent with a tool that runs build→ship→verify cycles and spawns repair/meta worktrees.

ℹ

Instruction Scope

The SKILL.md explicitly requires running the coding agent, preflight build/lint/test commands, deploy verification, git commits, worktrees, and arbitrary custom deploy scripts defined in SHIPLOOP.yml. That scope is expected for this purpose but inherently grants the skill the ability to execute arbitrary repo-defined commands with the user's privileges; the skill does include an explicit security notice and claims temp-file prompt passing and explicit staging to reduce injection and accidental mass staging.

✓

Install Mechanism

No install spec is provided (instruction-only for the runtime), and the repository includes Python code and docs; installing dependencies is standard (pyyaml, pydantic). Nothing in the manifest attempts to download or execute remote archives during install.

ℹ

Credentials

The skill declares no required environment variables (reasonable). The docs do reference agent-specific API keys (e.g., ANTHROPIC_API_KEY, OPENAI_API_KEY) and agent CLI flags; those are external to the skill and must be provided by the user for their chosen agent. This is proportionate, but users should be aware the agent command they configure may require secrets and permissions (and the example uses a '--permission-mode bypassPermissions' flag which increases risk if used).

ℹ

Persistence & Privilege

The tool writes state and metrics to disk (SQLite tars.db, .shiploop/metrics.json, learnings.yml) and commits/merges branches into the repo as part of normal operation. That persistence and repo modification is expected for a pipeline tool, but it means the skill will change your working tree and commit history — review SHIPLOOP.yml and learnings.yml behavior before running.

Scan Findings in Context

[unicode-control-chars] unexpected: The SKILL.md contains unicode control characters pattern(s) detected by the pre-scan. This can be used for prompt-injection or to obfuscate text. It is not required by the skill's purpose; inspect the SKILL.md (and any config files) for hidden control characters or suspiciously obfuscated content before trusting automated runs.

Assessment

This skill appears to be what it claims: a self‑healing build→ship→verify pipeline that runs your coding agent and manipulates your git repo. Before installing or running: 1) Inspect SHIPLOOP.yml in any repository you point it at — it will execute agent_command and preflight/deploy scripts as your user. 2) Don’t run on untrusted repos. Treat SHIPLOOP.yml like a Makefile/CI config. 3) Be mindful of the agent you configure — if you supply an agent CLI that needs API keys, provide them separately and understand their privileges; avoid using example flags like '--permission-mode bypassPermissions' unless you understand the implications. 4) Note Ship Loop writes state (SQLite DB, .shiploop/metrics.json, learnings.yml) and may commit learnings.yml back into the repo — consider whether you want that file tracked. 5) The pre-scan found unicode control characters in SKILL.md; open the files in a hex-aware editor or use a tool to strip/inspect control chars to ensure there is no hidden or obfuscated content. If you want higher assurance, run the code in a disposable environment or review the repository’s Python code (shiploop/) and tests; they appear to implement the documented behavior.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🚢 Clawdis

Binsgit, python3

latestvk97e4vxsb1qanpm9cqve9cjgth83psaf

143downloads

0stars

4versions

Updated 1mo ago

v5.0.0

MIT-0

Ship Loop v5.0 — TARS Convergence

Orchestrate multi-segment feature work as a self-healing pipeline. Three nested loops ensure maximum autonomy: Loop 1 runs the standard code→preflight→ship→verify chain, Loop 2 auto-repairs failures via the coding agent, Loop 3 spawns experiment branches when repairs stall. A SQLite state backend provides crash recovery and cross-run analytics. A verdict router replaces hardcoded branching with a configurable decision table. A reflection loop audits historical effectiveness and auto-generates learnings.

Architecture: Three Loops + Event Queue + Verdict Router

┌───────────────────────────────────────────────────────────┐
│                  SHIP LOOP v5.0                           │
│                                                           │
│  LOOP 1: Ship Loop                                        │
│  code → preflight → ship → verify → emit(segment_shipped)│
│          │                                                │
│       on fail (verdict → action via VerdictRouter)        │
│          ▼                                                │
│  LOOP 2: Repair Loop                                      │
│  capture context → agent fix → re-preflight (max N)      │
│  ↳ emit events: repair_done | repair_failed               │
│  ↳ convergence detected → CONVERGED verdict → META        │
│  ↳ unknown error → record_decision_gap()                  │
│          │                                                │
│       exhausted                                           │
│          ▼                                                │
│  LOOP 3: Meta Loop                                        │
│  meta-analysis → N experiment branches → winner → merge   │
│  ↳ emit: meta_done                                        │
│                                                           │
│  🗄  SQLite (tars.db): runs, segments, events, learnings  │
│  📋  Event Queue: crash recovery via unprocessed events   │
│  🔀  Verdict Router: configurable verdict→action table    │
│  📚  Learnings Engine: scored lessons (score tracks use)  │
│  🪞  Reflect Loop: post-run analysis + recommendations    │
│  💰  Budget Tracker: token/cost tracking per run          │
└───────────────────────────────────────────────────────────┘

Security Notice

SHIPLOOP.yml is equivalent to running a script. The agent_command, all preflight commands (build, lint, test), and custom deploy scripts execute with your full user privileges. Ship Loop does not sandbox these commands. Never use on untrusted repos without reviewing the config. Treat SHIPLOOP.yml with the same caution as a Makefile or CI pipeline.

When to Use

Building multiple features for a project in sequence
Any work that follows: code → preflight → commit → deploy → verify → next
When you need checkpointing so progress survives session restarts
When you want self-healing: failures auto-repair before asking humans
When you want cost visibility and learning from past runs

Prerequisites

Python 3.10+ with pyyaml and pydantic installed
A git repository with a remote
A deployment pipeline triggered by push (Vercel, Netlify, etc.)
A coding agent CLI configured via agent_command in SHIPLOOP.yml

Installation

pip install pyyaml pydantic

CLI Usage

# Core pipeline
shiploop run              # Start or resume the pipeline
shiploop run --dry-run    # Preview what would happen
shiploop status           # Show segment states (reads from DB)
shiploop reset <segment>  # Reset a segment to pending

# Learnings
shiploop learnings list
shiploop learnings search "dark mode theme toggle"

# Budget
shiploop budget           # Show cost summary

# v5.0 NEW
shiploop reflect          # Run meta-reflection on recent run history
shiploop reflect --depth 20  # Analyze last 20 runs
shiploop events           # View event history for latest run
shiploop events <run_id>  # View event history for specific run
shiploop history          # View past run history from DB

# Options
shiploop -c /path/to/SHIPLOOP.yml run
shiploop -v run           # Verbose logging
shiploop --version        # Show version (5.0.0)

Pipeline Definition (SHIPLOOP.yml)

project: "Project Name"
repo: /absolute/path/to/project
site: https://production-url.com
branch: pr               # direct-to-main | per-segment | pr
mode: solo

agent_command: "claude --print --permission-mode bypassPermissions"

preflight:
  build: "npm run build"
  lint: "npm run lint"
  test: "npm run test"

deploy:
  provider: vercel        # vercel | netlify | custom
  routes: [/, /api/health]
  marker: "data-version"
  health_endpoint: /api/health
  deploy_header: x-vercel-deployment-url
  timeout: 300

repair:
  max_attempts: 3

meta:
  enabled: true
  experiments: 3

budget:
  max_usd_per_segment: 10.0
  max_usd_per_run: 50.0
  max_tokens_per_segment: 500000
  halt_on_breach: true

# v5.0 NEW: Reflection config
reflection:
  enabled: true       # run reflect loop after pipeline
  auto_run: true      # automatically run, not just on CLI command
  history_depth: 10   # how many past runs to analyze

# v5.0 NEW: Custom verdict routing
router:
  agent_fail: retry      # override default (fail) with retry
  deploy_fail: fail      # override default (retry) with fail

segments:
  - name: "feature-name"
    status: pending
    prompt: |
      Your coding agent prompt here.
    depends_on: []

SQLite State Backend (v5.0)

State is now stored in .shiploop/tars.db (SQLite, WAL mode). SHIPLOOP.yml is config-only.

Tables

Table	Purpose
`runs`	Pipeline execution records (id, project, started_at, status, cost)
`segments`	Segment execution records per run (status, commit, touched_paths)
`run_events`	Event queue for crash recovery and audit trail
`learnings`	Failure/success lessons with effectiveness scores
`usage`	Token and cost records per agent invocation
`decision_gaps`	Situations the system didn't know how to handle

Event Types

Event	When emitted
`agent_started`	Agent invocation begins
`preflight_passed`	All preflight steps pass
`preflight_failed`	Any preflight step fails
`repair_done`	Repair loop succeeded
`repair_failed`	Repair loop failed or exhausted
`meta_done`	Meta loop winner merged
`segment_shipped`	Segment fully complete
`segment_failed`	Segment permanently failed
`deploy_failed`	Deploy or verification failed
`file_overlap_warning`	Segment may touch files changed by prior segment

Crash recovery: On startup, unprocessed events are replayed to restore pipeline state.

Verdict Router (v5.0)

The orchestrator no longer uses if/else chains. Every outcome maps to a Verdict, and a VerdictRouter maps verdicts to Action values.

Default Routing Table

Verdict	Default Action
`success`	`ship`
`preflight_fail`	`repair`
`agent_fail`	`fail`
`deploy_fail`	`retry`
`repair_success`	`ship`
`repair_exhausted`	`meta`
`meta_success`	`ship`
`meta_exhausted`	`fail`
`budget_exceeded`	`fail`
`converged`	`meta` ← skip remaining repairs, jump to meta
`no_changes`	`fail`
`unknown`	`pause_and_alert`

Override via router: section in SHIPLOOP.yml (see above).

Meta-Reflection Loop (v5.0)

Runs automatically after pipeline completion (when reflection.auto_run: true) or manually via shiploop reflect.

What It Analyzes

Repeat failures — same error_signature across multiple segments/runs
Repair-heavy segments — segments that needed >1 repair loop (same error type)
Efficiency trends — cost/time per segment trending up or down
Stale learnings — learnings with score < 0.3 that haven't helped
Decision gaps — situations that triggered MISSING_DECISION_BRANCH

Auto-creates learnings from patterns

If an error signature appears 3+ times across runs, the reflect loop auto-generates a AUTO-<sig> learning flagging it for human review.

shiploop reflect --depth 20

═════════════════════════════════════════════════════
🪞  Ship Loop Reflection Report
   Generated: 2026-03-27T06:30:00Z
   Runs analyzed: 10
═════════════════════════════════════════════════════

📊 Efficiency
   Total cost:     $12.4200
   Segments run:   8
   Avg/segment:    $1.5525

🔁 Repeat Failures (2)
   abc123def456… × 3
   ...

💡 Recommendations
   ⚠️  Error signature abc123de… repeated 3× across segments: auth, api, db.
   📉 2 stale learning(s) (score < 0.3): L002, L004.
   ✅ No issues detected in recent history. Pipeline looks healthy!

═════════════════════════════════════════════════════

Playbook Evolution (v5.0)

When a repair fails with an error that doesn't match any existing learning, the system records a decision_gap:

learnings.record_decision_gap(
    segment="auth",
    context="Repair exhausted with unmatched error: ...",
    verdict="repair_exhausted_unknown_error",
    run_id="...",
)

Decision gaps surface in shiploop reflect output and the decision_gaps DB table. Operators use them to add new learnings or router overrides.

Convergence Detection (v5.0 Enhanced)

Same-segment: if two consecutive repair attempts produce the same error hash → CONVERGED verdict → router jumps to META (skipping remaining repair attempts).

Cross-segment: before starting a segment, the orchestrator checks if any already-shipped segment touched the same files (via touched_paths in DB). If overlap detected, a file_overlap_warning event is emitted.

Learnings Scoring (v5.0)

score (default 1.0)
  +0.1 when injected and segment succeeds first-try
  -0.2 when injected and segment fails the same way

Search results are sorted by combined keyword-relevance × score. Learnings with score < 0.3 are flagged as stale in reflection.

shiploop learnings list  # shows all learnings with scores

State Machine

States per segment:
  pending → coding → preflight → shipping → verifying → shipped
                  ↘ repairing (Loop 2) → preflight
                  ↘ experimenting (Loop 3) → preflight → shipping
                  ↘ failed

SHIPLOOP.yml checkpointed after every transition (for backward compat). SQLite is the primary state store.

Deploy Providers

Provider	How it works
`vercel`	Polls routes for HTTP 200, checks `x-vercel-deployment-url` header
`netlify`	Polls routes for HTTP 200, checks `x-nf-request-id` header
`custom`	Runs `deploy.script` with `SHIPLOOP_COMMIT` and `SHIPLOOP_SITE` env vars

Budget Tracking

Token usage and estimated costs tracked per agent invocation in SQLite (falls back to metrics.json).

shiploop budget

💰 Budget Summary: Portfolio
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Total cost:       $3.84
  Budget remaining: $46.16
  Total records:    12

  By segment:
    dark-mode: $0.42
    contact-form: $3.42
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Critical Rules

Never break the chain — after a segment ships, immediately start the next
Preflight is mandatory — no exceptions, no "ship now fix later"
Explicit staging only — never git add -A, only changed files from git diff
Prompts via file — never shell arguments (prevents injection)
SQLite is source of truth — SHIPLOOP.yml config-only; runtime state in tars.db
Agent command from config — always read from agent_command, never hardcode
Budget-aware — track costs, enforce limits, fail gracefully

Project Structure

skills/ship-loop/
├── SKILL.md                  # This file
├── pyproject.toml
├── shiploop/
│   ├── __init__.py           # __version__ = "5.0.0"
│   ├── cli.py                # CLI (run, status, reset, reflect, events, history, ...)
│   ├── config.py             # SHIPLOOP.yml parsing + validation (Pydantic v2)
│   ├── orchestrator.py       # Main state machine + event queue + verdict routing
│   ├── db.py                 # NEW: SQLite state backend (tars.db)
│   ├── router.py             # NEW: Verdict→Action router
│   ├── learnings.py          # Learnings engine (SQLite + scoring + decision gaps)
│   ├── budget.py             # Cost/token tracking (SQLite backend)
│   ├── git_ops.py            # git operations + get_touched_paths()
│   ├── agent.py              # Agent runner
│   ├── deploy.py             # Deploy verification
│   ├── preflight.py          # Build + lint + test runner
│   ├── reporting.py          # Status messages + reports
│   ├── ship_utils.py         # Ship and verify helper
│   └── loops/
│       ├── ship.py           # Loop 1: code → preflight → ship
│       ├── repair.py         # Loop 2: repair + decision gap detection
│       ├── meta.py           # Loop 3: meta-analysis + experiments
│       ├── reflect.py        # NEW: post-run reflection + recommendations
│       └── optimize.py       # Optimization loop
├── providers/
│   ├── vercel.py
│   ├── netlify.py
│   └── custom.py
└── tests/
    ├── test_config.py
    ├── test_orchestrator.py
    ├── test_git_ops.py
    ├── test_budget.py
    ├── test_learnings.py
    └── ...

Changelog

v5.0.0 (2026-03-27) — TARS Convergence

SQLite state backend: tars.db replaces metrics.json + learnings.yml for runtime state
Event queue: all phase transitions emit events; unprocessed events enable crash recovery
Verdict router: configurable Verdict → Action table replaces if/else chains in orchestrator
Meta-reflection loop: shiploop reflect analyzes run history, finds patterns, auto-generates learnings
Playbook evolution: MISSING_DECISION_BRANCH detection → decision_gaps table
Cross-segment convergence: touched_paths tracked per segment for overlap warnings
Learnings scoring: score field (+0.1 on success, -0.2 on failure), sorted by score
New CLI commands: reflect, events, history
New config sections: reflection, router

v4.0.0

Python CLI replaces bash scripts
Pydantic v2 config validation
Budget tracking with per-segment and per-run limits
Error convergence detection (hash-based)
Deploy provider plugins (Vercel, Netlify, Custom)

Comments

Loading comments...