Production Harness

v1.0.0

Production-grade Agent Harness combining execution discipline (Superpower), knowledge compounding (CE), and product thinking (Gstack) into a single adaptive...

⭐ 0· 83·0 current·0 all-time

by@christianye

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for christianye/production-harness.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Production Harness" (christianye/production-harness) from ClawHub.
Skill page: https://clawhub.ai/christianye/production-harness
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install production-harness

ClawHub CLI

Package manager switcher

npx clawhub@latest install production-harness

Security Scan

Capability signals

CryptoCan make purchases

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description describe a multi-step engineering harness; the instructions only require typical engineering actions (spec, plan, build, test, checkpoint). There are no environment variables, binaries, or external services requested that would be inconsistent with that purpose.

✓

Instruction Scope

SKILL.md directs the agent to read code, write checkpoints to files, run verifications (tests, ls, wc, grep examples), and follow acceptance criteria. Those actions are within scope for a development harness. The instructions do not direct the agent to read unrelated system configs, secrets, or to exfiltrate data externally.

✓

Install Mechanism

No install spec or code files are included; this is instruction-only. That minimizes on-disk risk and matches the skill's stated nature.

✓

Credentials

The skill declares no required env vars, credentials, or config paths. The runtime instructions reference only standard developer operations (files, tests) and do not request tokens or other secrets.

✓

Persistence & Privilege

always is false and autonomous invocation is allowed (the platform default). The skill does advocate writing checkpoints to the filesystem and committing changes, which is expected for this harness and does not modify other skills or system-wide settings.

Assessment

This is a coherent, instruction-only engineering harness. Before installing: (1) understand that the harness expects the agent to read and write repository files and run tests — grant only the minimal filesystem/repo access needed (use a sandbox or a branch), (2) do not provide CI, cloud, or repo credentials to the agent unless you trust and audit its outputs, and (3) review the harness prompts and any automatic commit steps so changes are visible and reversible (use feature branches and a rollback plan). If you need the agent to operate on sensitive repos, require human confirmation for commits or restrict its write permissions.

Like a lobster shell, security has layers — review code before you run it.

agentvk9785vqmm105p9radr0as74pth84evwmengineeringvk9785vqmm105p9radr0as74pth84evwmharnessvk9785vqmm105p9radr0as74pth84evwmlatestvk9785vqmm105p9radr0as74pth84evwmtddvk9785vqmm105p9radr0as74pth84evwmverificationvk9785vqmm105p9radr0as74pth84evwmworkflowvk9785vqmm105p9radr0as74pth84evwm

83downloads

0stars

1versions

Updated 2w ago

v1.0.0

MIT-0

Agent Harness

A unified engineering harness that combines execution discipline, knowledge compounding, and product thinking. Born from 45万字 of real-world AI textbook writing + 9 production incidents.

Core Philosophy

Agent = Model + Harness. The model provides capability; the harness provides discipline.

Three layers, one workflow:

Challenge — Is this the right thing to build? (from Gstack)
Execute — Build it with engineering rigor (from Superpower)
Compound — Learn from what happened (from CE)

Task Complexity Auto-Grading

Before starting any task, assess complexity. This determines which workflow steps to run.

🟢 Simple (bug fix, config change, small tweak)

Skip spec/plan → Direct edit → Verify → Done
Example: "fix the typo in line 42", "update the API endpoint"

🟡 Medium (new feature, module, integration)

Plan → Build incrementally → Test → Review → Done
Example: "add user authentication", "integrate payment API"

🔴 Complex (architecture change, multi-module, new system)

Full pipeline: Challenge → Spec → Plan → Build → Test → Review → Ship
Example: "redesign the database schema", "build a multi-agent orchestrator"

When unsure, start at 🟡. Upgrade to 🔴 if you discover hidden complexity. Never downgrade mid-task.

Layer 1: Challenge (🔴 Complex tasks only)

Before writing any code, answer these questions. If any answer is "no" or uncertain, pause and discuss with the user.

Problem validity — Is the user solving a real problem or building a solution looking for a problem?
Simplest approach — Is there a simpler way that doesn't require building this?
Scope clarity — Can you explain what "done" looks like in one sentence?
Risk assessment — What's the worst thing that happens if this goes wrong?

Output: A one-paragraph problem statement that the user confirms before proceeding.

Layer 2: Execute

Spec (🟡🔴 only)

Define what you're building before you build it:

Goal: One sentence describing the outcome
Interface: Inputs, outputs, API contracts
Constraints: What you will NOT do (equally important as what you will do)
Acceptance criteria: How to verify it works (must be testable)

Plan (🟡🔴 only)

Break the spec into atomic tasks:

Each task modifies ≤3 files
Each task has a clear verification step
Tasks are ordered by dependency (independent tasks can parallelize)
Estimate: simple tasks ~5min, medium ~15min, complex ~30min

Build

Execute tasks incrementally. After each task:

Verify the task works (run it, test it, check the output)
Commit or checkpoint the progress
Only then move to the next task

Critical rules:

Never modify code you haven't read first
Don't add features beyond what was asked
Don't refactor "while you're at it"
If tests fail, report honestly — don't claim success

Verify

Every deliverable must have evidence, not just "looks good":

Deliverable type	Required evidence
Code change	Tests pass (show output)
Config change	Restart + verify (show status)
File generation	`wc -l` + `grep` key content
API integration	Show actual response
Documentation	Spot-check 3 claims for accuracy

Review (🟡🔴 only)

Self-review from 5 dimensions:

Correctness — Does it do what was asked?
Edge cases — What happens with empty input, huge input, concurrent access?
Security — Any injection points, leaked secrets, missing auth?
Performance — Will it work at 10x scale?
Maintainability — Will someone understand this code in 6 months?

Ship (🔴 only)

Pre-ship checklist:

All tests pass
Rollback plan exists (can you undo this in <5 min?)
Feature flag or gradual rollout if risky
Monitoring/alerting covers the new code path

Layer 3: Compound

After completing any task (regardless of complexity), spend 30 seconds on:

What broke? — Any errors, retries, unexpected behavior? → Record the specific lesson
What was slow? — Any step that took longer than expected? → Note the bottleneck
What would you do differently? — With hindsight, was there a better approach?

Only record specific, actionable lessons. Not generic advice like "be more careful".

Good: "Bedrock throttles at >2 concurrent requests to the same model. Use model rotation or serial execution." Bad: "Remember to handle API limits properly."

Anti-Rationalization Table

When you catch yourself thinking any of these, stop and follow the rebuttal:

Your excuse	Why it's wrong	Do this instead
"Too simple to need tests"	40% of P0 incidents come from "too simple" code	Write the test. It takes 2 minutes.
"I already checked, looks fine"	Reading ≠ verifying	Run it. `ls`, `wc -l`, `grep`, actual execution.
"I'll write tests after the feature is complete"	You won't. Test debt only grows.	Write the test NOW, before moving on.
"This old code looks unused, I'll delete it"	Chesterton's Fence: understand before removing	`git blame` first. Ask why it exists.
"It should work"	"Should" is not evidence	Provide logs, output, or data.
"Let me refactor this while I'm here"	Scope creep. You weren't asked to refactor.	Do only what was requested. File a separate TODO for the refactor.
"I'll handle errors later"	Error handling IS the feature in production	Handle errors now. Happy path without error handling is a prototype.
"The context is too long, I'll summarize and skip details"	Skipping details = skipping correctness	Checkpoint to file, compact context, continue with full fidelity.

Concurrent Subagent Scheduling

When delegating to subagents:

Concurrency limits:

≤2 subagents parallel to same API endpoint
2? Serialize or distribute across regions/models
4+ parallel = 75% failure rate (tested). Don't do it.

Task delegation rules:

Task instructions must be self-contained (don't say "go read file X")
Include content directly in the instruction, not file references
Each subagent writes to its own independent file
Subagents never communicate directly — everything goes through coordinator

Failure handling:

Don't blindly retry. First classify: Design failure? Alignment failure? Verification failure?
Check sessions_history for the actual error, don't guess
See references/mast-failure-taxonomy.md for the full classification framework

Verification Protocol

For important deliverables, use an independent verifier:

Verifier does NOT read the original requirements
Verifier only reads the output/deliverable
Verifier independently assesses: Is this correct? Complete? Well-formed?
Core principle: "The implementer is an LLM. Verify independently. Reading is not verification. Run it."

Checkpoint Protocol

Protect progress against crashes:

Write to file after each step — Don't accumulate results in memory
Design tasks as idempotent — Re-running a step produces the same result
Only retry the failed step — Don't restart from scratch
Progress must be observable — ls shows what's done, not model memory

See references/checkpoint-patterns.md for detailed patterns.

Quick Reference

🟢 Simple:  Edit → Verify → Done
🟡 Medium:  Plan → Build → Test → Review → Done
🔴 Complex: Challenge → Spec → Plan → Build → Test → Review → Ship → Compound

Comments

Loading comments...