Trinity Harness

v1.0.0

Production-grade Agent Harness combining execution discipline (Superpower), knowledge compounding (CE), and product thinking (Gstack) into a single adaptive...

⭐ 0· 85·0 current·0 all-time

by@christianye·duplicate of @christianye/production-harness

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for christianye/trinity-harness.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Trinity Harness" (christianye/trinity-harness) from ClawHub.
Skill page: https://clawhub.ai/christianye/trinity-harness
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install trinity-harness

ClawHub CLI

Package manager switcher

npx clawhub@latest install trinity-harness

Security Scan

Capability signals

CryptoCan make purchases

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (agent harness for engineering workflows) matches the SKILL.md content. There are no unexpected binaries, credentials, or unrelated requirements.

ℹ

Instruction Scope

The instructions assume the agent can read/write files, run basic verification commands (ls, wc, grep, git blame, run tests) and persist checkpoints to the filesystem. This is appropriate for a harness, but it does require filesystem and command execution privileges in the agent runtime — verify those are scoped/sandboxed in your environment.

✓

Install Mechanism

No install spec and no code files beyond documentation — lowest-risk delivery model (instruction-only). Nothing is downloaded or installed.

✓

Credentials

The skill requests no environment variables, credentials, or config paths. Declared requirements align with the described functionality.

✓

Persistence & Privilege

always:false and user-invocable:true (defaults). The skill does not request elevated persistence or modify other skills' configs.

Assessment

This is an instruction-only 'engineering harness' that appears internally consistent and does not request credentials or downloads. Before installing, confirm the agent runtime's file and command-execution privileges are appropriately sandboxed: the harness instructs the agent to write checkpoints and run verification commands (ls, wc, grep, git blame, run tests). If the runtime can access sensitive files or external networks, restrict its filesystem scope and monitor outputs to prevent accidental exposure. If you need higher assurance, run the skill in a disposable/sandbox agent instance and review any generated checkpoints or logs.

Like a lobster shell, security has layers — review code before you run it.

agentvk9744rqejpedqw8ws5d9fgxn9184f91yengineeringvk9744rqejpedqw8ws5d9fgxn9184f91yharnessvk9744rqejpedqw8ws5d9fgxn9184f91ylatestvk9744rqejpedqw8ws5d9fgxn9184f91ytddvk9744rqejpedqw8ws5d9fgxn9184f91yverificationvk9744rqejpedqw8ws5d9fgxn9184f91yworkflowvk9744rqejpedqw8ws5d9fgxn9184f91y

85downloads

0stars

1versions

Updated 2w ago

v1.0.0

MIT-0

Agent Harness

A unified engineering harness that combines execution discipline, knowledge compounding, and product thinking. Born from 45万字 of real-world AI textbook writing + 9 production incidents.

Core Philosophy

Agent = Model + Harness. The model provides capability; the harness provides discipline.

Three layers, one workflow:

Challenge — Is this the right thing to build? (from Gstack)
Execute — Build it with engineering rigor (from Superpower)
Compound — Learn from what happened (from CE)

Task Complexity Auto-Grading

Before starting any task, assess complexity. This determines which workflow steps to run.

🟢 Simple (bug fix, config change, small tweak)

Skip spec/plan → Direct edit → Verify → Done
Example: "fix the typo in line 42", "update the API endpoint"

🟡 Medium (new feature, module, integration)

Plan → Build incrementally → Test → Review → Done
Example: "add user authentication", "integrate payment API"

🔴 Complex (architecture change, multi-module, new system)

Full pipeline: Challenge → Spec → Plan → Build → Test → Review → Ship
Example: "redesign the database schema", "build a multi-agent orchestrator"

When unsure, start at 🟡. Upgrade to 🔴 if you discover hidden complexity. Never downgrade mid-task.

Layer 1: Challenge (🔴 Complex tasks only)

Before writing any code, answer these questions. If any answer is "no" or uncertain, pause and discuss with the user.

Problem validity — Is the user solving a real problem or building a solution looking for a problem?
Simplest approach — Is there a simpler way that doesn't require building this?
Scope clarity — Can you explain what "done" looks like in one sentence?
Risk assessment — What's the worst thing that happens if this goes wrong?

Output: A one-paragraph problem statement that the user confirms before proceeding.

Layer 2: Execute

Spec (🟡🔴 only)

Define what you're building before you build it:

Goal: One sentence describing the outcome
Interface: Inputs, outputs, API contracts
Constraints: What you will NOT do (equally important as what you will do)
Acceptance criteria: How to verify it works (must be testable)

Plan (🟡🔴 only)

Break the spec into atomic tasks:

Each task modifies ≤3 files
Each task has a clear verification step
Tasks are ordered by dependency (independent tasks can parallelize)
Estimate: simple tasks ~5min, medium ~15min, complex ~30min

Build

Execute tasks incrementally. After each task:

Verify the task works (run it, test it, check the output)
Commit or checkpoint the progress
Only then move to the next task

Critical rules:

Never modify code you haven't read first
Don't add features beyond what was asked
Don't refactor "while you're at it"
If tests fail, report honestly — don't claim success

Verify

Every deliverable must have evidence, not just "looks good":

Deliverable type	Required evidence
Code change	Tests pass (show output)
Config change	Restart + verify (show status)
File generation	`wc -l` + `grep` key content
API integration	Show actual response
Documentation	Spot-check 3 claims for accuracy

Review (🟡🔴 only)

Self-review from 5 dimensions:

Correctness — Does it do what was asked?
Edge cases — What happens with empty input, huge input, concurrent access?
Security — Any injection points, leaked secrets, missing auth?
Performance — Will it work at 10x scale?
Maintainability — Will someone understand this code in 6 months?

Ship (🔴 only)

Pre-ship checklist:

All tests pass
Rollback plan exists (can you undo this in <5 min?)
Feature flag or gradual rollout if risky
Monitoring/alerting covers the new code path

Layer 3: Compound

After completing any task (regardless of complexity), spend 30 seconds on:

What broke? — Any errors, retries, unexpected behavior? → Record the specific lesson
What was slow? — Any step that took longer than expected? → Note the bottleneck
What would you do differently? — With hindsight, was there a better approach?

Only record specific, actionable lessons. Not generic advice like "be more careful".

Good: "Bedrock throttles at >2 concurrent requests to the same model. Use model rotation or serial execution." Bad: "Remember to handle API limits properly."

Anti-Rationalization Table

When you catch yourself thinking any of these, stop and follow the rebuttal:

Your excuse	Why it's wrong	Do this instead
"Too simple to need tests"	40% of P0 incidents come from "too simple" code	Write the test. It takes 2 minutes.
"I already checked, looks fine"	Reading ≠ verifying	Run it. `ls`, `wc -l`, `grep`, actual execution.
"I'll write tests after the feature is complete"	You won't. Test debt only grows.	Write the test NOW, before moving on.
"This old code looks unused, I'll delete it"	Chesterton's Fence: understand before removing	`git blame` first. Ask why it exists.
"It should work"	"Should" is not evidence	Provide logs, output, or data.
"Let me refactor this while I'm here"	Scope creep. You weren't asked to refactor.	Do only what was requested. File a separate TODO for the refactor.
"I'll handle errors later"	Error handling IS the feature in production	Handle errors now. Happy path without error handling is a prototype.
"The context is too long, I'll summarize and skip details"	Skipping details = skipping correctness	Checkpoint to file, compact context, continue with full fidelity.

Concurrent Subagent Scheduling

When delegating to subagents:

Concurrency limits:

≤2 subagents parallel to same API endpoint
2? Serialize or distribute across regions/models
4+ parallel = 75% failure rate (tested). Don't do it.

Task delegation rules:

Task instructions must be self-contained (don't say "go read file X")
Include content directly in the instruction, not file references
Each subagent writes to its own independent file
Subagents never communicate directly — everything goes through coordinator

Failure handling:

Don't blindly retry. First classify: Design failure? Alignment failure? Verification failure?
Check sessions_history for the actual error, don't guess
See references/mast-failure-taxonomy.md for the full classification framework

Verification Protocol

For important deliverables, use an independent verifier:

Verifier does NOT read the original requirements
Verifier only reads the output/deliverable
Verifier independently assesses: Is this correct? Complete? Well-formed?
Core principle: "The implementer is an LLM. Verify independently. Reading is not verification. Run it."

Checkpoint Protocol

Protect progress against crashes:

Write to file after each step — Don't accumulate results in memory
Design tasks as idempotent — Re-running a step produces the same result
Only retry the failed step — Don't restart from scratch
Progress must be observable — ls shows what's done, not model memory

See references/checkpoint-patterns.md for detailed patterns.

Quick Reference

🟢 Simple:  Edit → Verify → Done
🟡 Medium:  Plan → Build → Test → Review → Done
🔴 Complex: Challenge → Spec → Plan → Build → Test → Review → Ship → Compound

Comments

Loading comments...