Install
openclaw skills install orcheA multi-agent orchestration engine that systematically executes complex tasks in 4 phases (Query → Plan → Execute → Verify). Ensures high-quality deliverable...
openclaw skills install orcheA multi-agent orchestration engine that executes complex tasks in Query → Plan (Debate) → Execute → Verify — 4 phases. A sub-agent panel debates, critiques, executes, and verifies, while the watchdog monitors the entire process.
When complex tasks are delegated to AI agents, the following problems arise:
| Problem | Symptom | Result |
|---|---|---|
| Hallucination | Uses non-existent APIs/libraries | Non-executable code |
| Execution without planning | Starts writing code immediately | Direction change costs explode |
| No verification | Deliverables "look plausible" | Don't actually work |
| Linear-only progression | Keeps going forward even on failure | Root cause unresolved |
| Cost explosion | Indiscriminate agent deployment | Budget exhaustion |
┌─────────────────────────────────────────────────────────────┐
│ │
│ Phase 0 Phase 1 Phase 2 Phase 3 │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Query │─G0─▶│ Plan │──G1─▶│Execute│─G2─▶│Verify│ │
│ │ │ │(Debate)│ │(Parall)│ │ │ │
│ └──────┘ └──────┘ └──────┘ └──┬───┘ │
│ │ ▲ │ │
│ │ │ fail? │
│ │ └──── YES ◀──┘ │
│ │ │
│ └── Watchdog monitors the entire process ───────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Key Differentiators:
| Feature | dispatching-parallel-agents | parallel-agent-management | executing-plans | orche |
|---|---|---|---|---|
| Parallel execution | ✅ | ✅ | ❌ (sequential) | ✅ |
| Pre-query phase | ❌ | ❌ | ❌ | ✅ Phase 0 |
| Debate/discussion | ❌ | ❌ | ❌ | ✅ Includes Critic |
| Phase gates | ❌ | ❌ | ✅ (checkpoints) | ✅ 4 gates |
| Hallucination check | ❌ | ❌ | ❌ | ✅ 14 items |
| Auto regression on verification failure | ❌ | ❌ | ❌ | ✅ Up to 3 retries |
| Watchdog integration | ❌ | ❌ | ❌ | ✅ |
| Cost management | ❌ | ❌ | ❌ | ✅ 4-tier budget |
| Session disconnection recovery | ❌ | ❌ | ❌ | ✅ State file based |
| Best for | Simple parallel tasks | Codebase splitting | Sequential plan execution | Complex multi-step projects |
✅ Orche is appropriate when:
❌ Other skills are better when:
dispatching-parallel-agentsparallel-agent-managementexecuting-plansPlace the skill file in the OpenClaw skills directory:
# Copy to skills directory
mkdir -p ~/.agents/skills/orche
cp SKILL.md ~/.agents/skills/orche/SKILL.md
Enter the /orche command in chat and describe your task:
/orche "Write a market analysis report on Korean AI startups"
Or in natural language:
Orchestrate this project systematically
Orche will ask 3~10 questions. Answer clearly:
🎼 /orche started — Phase 0: Query
📋 Questions to clarify requirements:
1. 📐 Scope: All AI fields? Specific segment (LLM, robotics, etc.)?
2. 🎯 Priority: Market size vs competitive analysis vs investment trends?
3. 📏 Success criteria: Report length? Number of data sources?
4. 🔧 Technical constraints: Specific data sources needed?
5. ⏰ Deadline: Is there a due date?
Once you answer, Orche automatically:
🎼 /orche Completion Report
📋 Task: orche-1711432800
⏱️ Total duration: 23 minutes
📊 By phase: Query 3m → Plan 5m → Execute 10m → Verify 5m
🤖 Agents deployed: 11
✅ Tasks completed: 5/5
📁 Deliverables:
- workspace/orche-1711432800/requirements.md
- workspace/orche-1711432800/final-plan.md
- workspace/orche-1711432800/tasks/
- workspace/orche-1711432800/verification/final-verdict.md
Phase 0: Query — Clarify requirements (3~10 questions)
Phase 1: Plan — 3~7 sub-agents debate → finalize plan
Phase 2: Execute — Split into tasks → 3~8 sub-agents execute in parallel
Phase 3: Verify — 3~5 verifiers deployed → regress to Phase 2 if issues found
Each phase transition requires gate conditions to be met before proceeding. Automatic blocking on non-compliance.
| Gate | Transition | Key Conditions |
|---|---|---|
| G0 | Phase 0 → 1 | Requirements structured + ambiguous expressions removed + user approval |
| G1 | Phase 1 → 2 | final-plan.md created + hallucination check passed + no circular dependencies |
| G2 | Phase 2 → 3 | All tasks completed + deliverables exist + no zombie agents |
| Verification | Phase 3 → Done | 14-item hallucination check passed + all required deliverables present |
On gate failure:
Remove ambiguity from user instructions and specify them to an actionable level.
requirements.md| Category | Example |
|---|---|
| Scope | What's in scope? What should be excluded? |
| Technical constraints | Language/framework/environment limitations? |
| Priority | What matters most? (speed/quality/cost) |
| Success criteria | How to judge completion? Deliverables? |
| Dependencies | External systems/APIs/data? |
| Risk | Impact of failure? Rollback needed? |
If G0 feasibility check determines the task is impossible:
orche-state.json status → "rejected"| Role | Responsibility |
|---|---|
| Domain Expert (1~2) | Draft plan based on domain expertise |
| Advocate | Strengthen plan merits, argue feasibility |
| Critic | Point out weaknesses/risks/gaps + hallucination check |
| Moderator | Synthesize debate, build consensus, write final plan |
sessions_spawn:
task: "<role-specific prompt>"
label: "orche-<role>-<taskId>"
model: "anthropic/claude-sonnet-4-6" # Default (use opus for high complexity)
mode: "run"
Model Selection Guide:
- Planning/debate:
sonnet(cost-efficient)- Complex domain experts:
opus(when deep reasoning needed)- Simple execution tasks:
sonnetorhaiku(fast processing)
Round 1:
final-plan.mdIf unresolved issues remain → Round 2 (up to 3 rounds max).
final-plan.md createdSplit from final-plan.md into independently executable task units:
{
"id": "task-1",
"title": "...",
"description": "...",
"dependencies": [],
"assignedAgent": null,
"status": "pending",
"output": null
}
Task Status Flow: pending → running → completed / failed
All execution agent prompts must include the following:
completed or skipped_with_reason| Role | Responsibility |
|---|---|
| Completeness Verifier | All tasks executed, nothing missing |
| Accuracy Verifier | Deliverables match task spec + success criteria |
| Hallucination Verifier | Detect factual errors, logical contradictions, non-existent references |
| Integration Verifier | Cross-deliverable consistency, interface compatibility |
| Final Reviewer (optional) | Final review against original requirements |
| Result | Action |
|---|---|
| All PASS | Phase 3 complete → normal termination |
| HIGH issue | Regress to Phase 2 (re-execute affected tasks only) |
| MED issue | Report to user, confirm re-execution |
| LOW issue | Record in report, proceed |
Automatic regression to Phase 2 if any of the following apply:
hallucination_score >= 2 (2+ failures in 14-item check)retryCount maximum 3 timesAll orchestration state is recorded in workspace/orche-state.json.
The watchdog and each phase share this file to track progress.
{
"taskId": "orche-<timestamp>",
"phase": 0,
"status": "active",
"startedAt": "<ISO>",
"phaseStartedAt": "<ISO>",
"requirements": null,
"plan": null,
"tasks": [],
"agents": [],
"watchdog": {
"enabled": true,
"lastCheck": null
},
"retryCount": 0,
"maxRetries": 3,
"tokenBudget": {
"total": 500000,
"used": 0,
"currency": "tokens"
},
"errors": []
}
| Field | Type | Description |
|---|---|---|
taskId | string | Unique identifier (orche-<unix_timestamp>) |
phase | number | Current phase (0~3) |
status | string | active / completed / aborted / rejected |
requirements | object | Requirements finalized in Phase 0 |
plan | object | Plan finalized in Phase 1 |
tasks | array | Phase 2 task list + status |
agents | array | Currently active sub-agent session key list |
retryCount | number | Phase 3 → 2 regression count |
tokenBudget | object | Token budget management |
errors | array | Error log |
Each orchestration creates an independent working directory:
workspace/orche-<taskId>/
├── requirements.md ← Phase 0 deliverable
├── round-1/ ← Phase 1 debate records
│ ├── expert-1-proposal.md ← Expert proposal
│ ├── expert-2-proposal.md ← (if applicable)
│ ├── advocate-review.md ← Advocate review
│ ├── critic-review.md ← Critic review
│ └── moderator-summary.md ← Moderator synthesis
├── round-2/ ← (2nd round debate if needed)
├── final-plan.md ← Phase 1 final plan
├── tasks/ ← Phase 2 deliverables
│ ├── task-1/
│ │ └── output.md
│ ├── task-2/
│ │ └── output.md
│ └── ...
└── verification/ ← Phase 3 verification results
├── completeness-report.md ← Completeness verification
├── accuracy-report.md ← Accuracy verification
├── hallucination-report.md ← Hallucination verification
├── integration-report.md ← Integration verification
└── final-verdict.md ← Final verdict
watchdog.md (when using OpenClaw watchdog)orche-state.jsonG0 not passed → Phase 1 entry blocked. Re-confirmation requested from user.
G1 not passed → Phase 2 entry blocked. Plan re-debate forced.
G2 not passed → Phase 3 entry blocked. Missing tasks re-executed.
Verification fail → Auto regression to Phase 2 (retryCount < 3).
Rules that must be included in every sub-agent spawn prompt:
[Harness Rules]
1. You must save results to the designated file path upon completion
2. Mark uncertain facts with "needs verification:" — no fabrication
3. Do not use non-existent APIs/libraries
4. On task failure, record failure reason in output file and terminate
These rules are injected into all sub-agents. This ensures:
On Phase 3 completion, you must:
1. Set orche-state.json status → "completed"
2. Mark task [x] in watchdog.md (if using watchdog)
3. Send final report to user
Orche integrates with OpenClaw's watchdog system to monitor the entire process.
| Detection Item | Auto Response |
|---|---|
| Agent 15min+ no change (hang) | kill + re-spawn |
| Empty deliverable (file size 0) | Retry with reinforced prompt |
| All agents lost (deadlock) | Immediate user notification |
| Token budget 80% reached | User warning |
Register the task in watchdog.md at Orche start:
- [ ] orche <taskId> — Phase 0 (Query)
session_keys: agent:main:subagent:xxxx
state_file: workspace/orche-state.json
evidence_paths:
- workspace/orche-<taskId>/requirements.md
- workspace/orche-<taskId>/final-plan.md
done_when:
- orche-state.json status == "completed"
These principles reflect lessons learned in production.
watchdog.md → query sessions_list using session_keysNo watchdog in your environment? Orche works without a watchdog. The watchdog is an additional safety net; phase gates and harness rules function as core safety mechanisms without it.
Used by Phase 1's Critic and Phase 3's verification team.
| ID | Item | Verification Method |
|---|---|---|
| H-1 | File path existence | Verify with stat() or ls |
| H-2 | Command existence | Verify with which or command -v |
| H-3 | URL validity | Verify with HTTP request (optional) |
| H-4 | Code syntax validity | Grammar check with linter/parser |
| H-5 | Numerical data cross-verification | Confirm with 2+ sources |
| ID | Item | Verification Method |
|---|---|---|
| H-6 | No self-contradiction | Detect conflicting statements within document |
| H-7 | Plan-result alignment | 1:1 mapping verification |
| H-8 | Terminology consistency | Same term for same concept |
| ID | Item | Verification Method |
|---|---|---|
| H-9 | No remaining TODO/FIXME | grep -r "TODO|FIXME" |
| H-10 | No placeholders | Detect [INSERT], TBD, ... |
| H-11 | All deliverables exist | Cross-check deliverable list against requirements |
| ID | Item | Verification Method |
|---|---|---|
| H-12 | No fictional library/API references | Verify existence on npm/pip/crates.io etc. |
| H-13 | No unsourced statistics | Verify source citation or "needs verification:" label |
| H-14 | No overconfident expressions | Detect "always", "never", "100%" type claims |
0~1 failures → PASS ✅
2~3 failures → WARNING ⚠️ (MED — user judgment)
4+ failures → FAIL ❌ (HIGH — auto regression)
Orche is a multi-agent system, so cost management is essential.
| Tier | Ratio | Action |
|---|---|---|
| 🟢 Normal | < 50% | Normal operation |
| 🟡 Caution | 50~80% | Limit concurrent agents, prefer sonnet |
| 🔴 Warning | 80~95% | Stop new spawns, complete in-progress tasks only |
| ⛔ Exceeded | > 95% | Full stop, request user judgment |
Actively leverage model selection:
haiku (lowest cost)sonnet (best value)opus (only when needed)Limit concurrent agents to 3 (default). 8 is the maximum; typically 3~4 is sufficient.
Set token budget according to task scale:
| Task Scale | Recommended Budget (tokens) | Expected Agents | Estimated Cost (Claude Sonnet) |
|---|---|---|---|
| Small (simple research) | 200,000 | 6~8 | ~$0.6 |
| Medium (code refactoring) | 500,000 | 10~15 | ~$1.5 |
| Large (market analysis report) | 1,000,000 | 15~20 | ~$3.0 |
Note: Costs above are estimates based on Claude Sonnet. Actual costs vary by model, prompt length, and retry count.
Adjust the budget by modifying tokenBudget.total in orche-state.json:
{
"tokenBudget": {
"total": 1000000,
"used": 0
}
}
| Exception | Detection Method | Auto Response | User Notification |
|---|---|---|---|
| Agent hang | Watchdog 15min no change | kill + re-spawn | On 2 consecutive occurrences |
| Empty deliverable | File size 0 | Retry with reinforced prompt | On 2 failures |
| API error (429/500) | HTTP status | Retry after 30s | On 3 failures |
| Hallucination detected | Critic/verification team | Regenerate affected portion | On HIGH severity |
| Total deadlock | All agents lost | — | Immediately |
| Context exceeded | Error message | Retry with reduced input | On failure |
| maxRetries exceeded | Counter check | — | Immediate judgment request |
| Token budget exceeded | used/total ratio | Model downgrade | At 80% |
All errors are recorded in orche-state.json's errors array:
{
"errors": [
{
"timestamp": "2026-03-26T10:30:00Z",
"phase": 2,
"taskId": "task-3",
"type": "AGENT_HANG",
"message": "Agent orche-exec-task-3 unresponsive for 15+ minutes",
"action": "kill + re-spawn"
}
]
}
| Timing | Required? | Content |
|---|---|---|
| Phase 0 complete | ✅ Required | Approve requirements |
| Phase 1 complete | Optional | Review plan (default: auto-proceed) |
| Phase 2 failure | ✅ Required | Judgment after 2 retry failures |
| Phase 3 MED issue | ✅ Required | Decide on re-execution |
| Phase 3 complete | Notification | Final results report |
Auto-proceeding from Phase 1 to Phase 2 is the default behavior. If you want to review the plan directly, specify "I want approval at each step" during Phase 0 queries.
When the user requests task cancellation:
orche-state.json status → "aborted"🎼 /orche Abort Report
📋 Task: orche-1711432800
🛑 Abort reason: User request
✅ Completed tasks: 3/7
📁 Completed deliverables:
- workspace/orche-1711432800/tasks/task-1/output.md
- workspace/orche-1711432800/tasks/task-2/output.md
- workspace/orche-1711432800/tasks/task-3/output.md
If the main session disconnects mid-process, the orchestration stops.
orche-state.json is the recovery key.
State updated immediately on each task completion:
completedAt recorded in orche-state.json upon receiving each sub-agent resultWatchdog detects hangs: (when using watchdog)
watchdog.md and main session is unresponsive for 20+ minutes in sessions_list, reports to userManual recovery:
orche-state.json → continues from incomplete tasks/orche resume
This command reads orche-state.json and resumes execution from the last completed phase/task.
User Input:
/orche "Write a 2025 Korean GenAI market analysis report.
Must include market size, key players, investment trends, and regulatory landscape."
Phase 0 — Query (2~3 min):
🎼 Phase 0: Query
1. 📐 Scope: All GenAI? (LLM, image generation, code assistants, etc.)
2. 🎯 Depth: Individual company analysis? Or market overview level?
3. 📊 Data: Public data only? Paid research sources?
4. 📏 Length: How many pages (A4 equivalent)?
5. 🎨 Format: Markdown? PDF?
6. ⏰ Purpose: Internal report? Investment decision? Blog?
Phase 1 — Planning (5 min):
Phase 2 — Execution (10~15 min):
| Task | Agent | Deliverable |
|---|---|---|
| Market overview | agent-1 | tasks/task-1/market-overview.md |
| Key players | agent-2 | tasks/task-2/key-players.md |
| Investment trends | agent-3 | tasks/task-3/investment-trends.md |
| Regulatory landscape | agent-4 | tasks/task-4/regulatory.md |
| Consolidated report | agent-5 | tasks/task-5/final-report.md |
Phase 3 — Verification (5 min):
User Input:
/orche "Refactor our project's API layer.
Currently Express monolithic — want to separate modules by domain and increase test coverage."
Phase 0 — Query (3 min):
🎼 Phase 0: Query
1. 📐 Scope: Entire API? Specific domains only?
2. 🔧 Tech stack: Express + TypeScript? ORM?
3. 📁 Current structure: Source directory path?
4. 🎯 Target structure: Have an example? (NestJS-style? Layered?)
5. ✅ Tests: Current coverage? Test framework?
6. ⚠️ Constraints: Backward compatibility? Zero-downtime deployment?
Phase 1 — Planning (7 min):
Phase 2 — Execution (15~20 min):
| Task | Agent | Deliverable |
|---|---|---|
| Dependency analysis | agent-1 | tasks/task-1/dependency-graph.md |
| User module separation | agent-2 | tasks/task-2/user-module/ |
| Auth module separation | agent-3 | tasks/task-3/auth-module/ |
| Order module separation | agent-4 | tasks/task-4/order-module/ |
| Shared utilities cleanup | agent-5 | tasks/task-5/shared/ |
| Test writing | agent-6 | tasks/task-6/tests/ |
Phase 3 — Verification (7 min):
User Input:
/orche "Do a SaaS competitor analysis.
Target Notion, Coda, Slite — compare features, pricing strategy, and market positioning."
Phase 0 — Query (2 min):
🎼 Phase 0: Query
1. 📐 Scope: Any additional competitors beyond these 3?
2. 🎯 Analysis depth: Feature-level detail? Strategic level?
3. 📊 Data: Public pricing only? Actual usage experience needed?
4. 📏 Deliverables: Comparison table + strategy report?
5. 🔍 Our product: Is there a reference product to compare against?
Phase 1 — Planning (5 min):
Phase 2 — Execution (10 min):
| Task | Agent | Deliverable |
|---|---|---|
| Notion analysis | agent-1 | tasks/task-1/notion-analysis.md |
| Coda analysis | agent-2 | tasks/task-2/coda-analysis.md |
| Slite analysis | agent-3 | tasks/task-3/slite-analysis.md |
| Comparative summary | agent-4 | tasks/task-4/comparison-report.md |
Phase 3 — Verification (5 min):
Below are summaries of actual projects completed with Orche.
| Item | Details |
|---|---|
| Task | Repackage internal skills for ClawHub publishing |
| Duration | 25 minutes |
| Agents deployed | 12 (planning 4 + execution 5 + verification 3) |
| By phase | Query 3m → Plan 6m → Execute 10m → Verify 6m |
| Regressions | 0 (passed first time) |
| Notable | Critic flagged "environment-dependent hardcoded cron IDs" → fixed during planning phase |
| Item | Details |
|---|---|
| Task | Technical doc translation (Korean/English/Japanese) + terminology unification |
| Duration | 35 minutes |
| Agents deployed | 15 (planning 3 + execution 8 + verification 4) |
| By phase | Query 2m → Plan 5m → Execute 18m → Verify 10m |
| Regressions | 1 (H-8 terminology consistency failure → glossary-based retranslation) |
| Notable | Terminology inconsistency auto-detected → Phase 2 regression → fixed and passed |
| Item | Details |
|---|---|
| Task | REST API design → implementation → test code → documentation |
| Duration | 42 minutes |
| Agents deployed | 18 (planning 5 + execution 8 + verification 5) |
| By phase | Query 4m → Plan 8m → Execute 20m → Verify 10m |
| Regressions | 1 (H-12 non-existent middleware reference → affected task re-executed) |
| Notable | Critic debate on "implement auth middleware vs use library" → Moderator decided on verified library |
Total completions: 3
Average duration: 34 minutes
Average agents: 15
First-pass rate: 33% (1/3)
Resolved via auto-regression: 100% (2/2 regressions resolved within 3 retries)
Manual escalations: 0
If the hallucination-guard skill is installed, Orche can leverage it during the verification phase for more precise hallucination detection.
Install hallucination-guard skill:
# Install from ClawHub (needs verification: install command may vary by environment)
mkdir -p ~/.agents/skills/hallucination-guard
# Place SKILL.md in that directory
Auto-utilized in Phase 3 verification:
Integration Architecture:
Phase 3: Verification
├── Completeness Verifier
├── Accuracy Verifier
├── Hallucination Verifier ──── Uses hallucination-guard skill
│ ├── Orche's built-in 14-item check
│ └── + hallucination-guard additional checks (when installed)
├── Integration Verifier
└── Final Reviewer
| Task Type | Orche Alone | + hallucination-guard |
|---|---|---|
| Research report | ✅ Sufficient | ✅✅ Enhanced numerical verification |
| Code refactoring | ✅ Sufficient | ✅✅✅ API existence check essential |
| Market analysis | ✅ Sufficient | ✅✅ Enhanced statistics source verification |
| Document translation | ✅ Sufficient | ✅ Minimal difference |
Default is up to 3 rounds. Complex domains may require more rounds. Specify "I want thorough plan debate" during Phase 0 to adjust the round count.
| Intensity | Verifier Count | Hallucination Check | Recommended For |
|---|---|---|---|
| light | 2 | Essential 5 items only | Simple tasks, time savings |
| standard (default) | 3~4 | All 14 items | General tasks |
| thorough | 5 | 14 items + additional cross-verification | High-risk tasks |
You can specify verification intensity during Phase 0:
/orche "Refactor the API. Use thorough verification."
Specify different models per phase based on task characteristics:
{
"modelConfig": {
"planning": "anthropic/claude-sonnet-4-6",
"execution": "anthropic/claude-sonnet-4-6",
"verification": "anthropic/claude-sonnet-4-6",
"complex": "anthropic/claude-opus-4-6"
}
}
Default is 3 (maximum 8). Adjust in orche-state.json:
{
"concurrency": {
"default": 3,
"max": 8
}
}
A: No. Phase 0 is mandatory. Proceeding with ambiguous requirements causes direction change costs in Phase 2 to grow exponentially. Clear answers in Phase 0 actually reduce total time.
A: Depends on task scale. Small tasks (research report) use approximately 200K tokens (Claude Sonnet $0.6), large tasks (API design + implementation) approximately 1M tokens ($3.0). See the cost management section for budget settings.
A: Auto-regression up to 3 times. If all 3 fail, escalated to user for direct judgment. Also escalated if the same reason fails 2 consecutive times.
A: Yes. The watchdog is an additional safety net; phase gates and harness rules are the core safety mechanisms. Orche functions normally without a watchdog.
A: Orche operates independently. However, handling simple parallel tasks within Phase 2 using dispatching-parallel-agents patterns is naturally compatible.
A: No. All state is recorded in orche-state.json and the file system. Use /orche resume to continue from the last progress point.
A: The Moderator builds consensus. Debate runs up to 3 rounds; if consensus isn't reached, the Moderator has final decision authority.
A: No. Sonnet alone can handle most tasks. Opus is used selectively only for domain expert roles requiring complex reasoning. Sonnet is recommended as default for cost savings.
sessions_spawn (mode: "run")Made with 🎼 by reikys — Battle-tested in production environments.