Install
openclaw skills install caid-multi-agentCoordinate multiple sub-agents to collaboratively complete long-horizon software engineering tasks using the CAID (Centralized Asynchronous Isolated Delegati...
openclaw skills install caid-multi-agentThis skill implements the Centralized Asynchronous Isolated Delegation (CAID) paradigm for coordinating multiple agents working on shared artifacts.
⚠️ CRITICAL WARNINGS FROM PAPER:
- Use CAID from the outset — Don't run single-agent first as fallback. Sequential strategy costs nearly 2x with minimal gain.
- Physical worktree isolation is mandatory — Soft isolation (instruction-only) degrades performance on complex tasks.
- Engineer limits are strict — 2 for PaperBench-style, 4 for Commit0-style, never exceed 8.
- Higher cost/runtime trade-off — CAID improves accuracy, not speed. Integration is sequential/test-gated.
Use CAID from the outset for:
Don't use as fallback: Running single-agent first then CAID is inefficient (cost/runtime nearly additive, minimal performance gain).
Use single-agent for:
Before ANY delegation, the manager must:
Prepare runtime environment
Organize entry points
Add minimal function stubs
Commit to main branch
# Pre-setup commit
git add .
git commit -m "setup: initial stubs and entry points"
git push origin main
Manager's role: Before delegating, analyze the task structure:
Commit0-style tasks (clear file structure):
PaperBench-style tasks (inferred structure):
Dependency graph construction:
Ready_t(v_j) ⇔ ∀(v_i, v_j) ∈ E, v_i ∈ Completed_t
Only delegate tasks from {v ∈ V | Ready_t(v)}
Create PHYSICALLY isolated worktrees (not soft isolation):
# Main branch is the single source of truth
git worktree add ../workspace-engineer-1 <branch-name-1>
git worktree add ../workspace-engineer-2 <branch-name-2>
# etc.
⚠️ WARNING: Soft isolation (same workspace, instruction-level constraints) degrades performance to below single-agent on PaperBench. Physical
git worktreeisolation is mandatory.
Key isolation principles:
__init__.py, config files, global constants — engineers must NOT commit changes to theseSTRICT Engineer Limits:
| Task Type | Max Engineers | Why |
|---|---|---|
| PaperBench-style | 2 | Inferred dependencies; more destabilizes |
| Commit0-style | 4 | Clear file structure; test-guided |
| General SWE | 2-4 | Balance parallelism vs integration overhead |
| Absolute max | 8 | Beyond this, coordination tax exceeds gains |
⚠️ Critical: Increasing engineers beyond optimal degrades performance due to integration overhead and conflict resolution costs.
Task prioritization heuristics: Manager should prioritize tasks that:
Round definition:
One round = complete cycle of delegation → implementation → dependency update
Recommended iteration limits (from paper experiments):
| Role | Max Iterations |
|---|---|
| Manager | 50 |
| Each Engineer | 80 |
| Total Rounds | ~22 (varies by task) |
Delegation algorithm:
At round t:
1. Ready_Set = {v ∈ V | Ready_t(v)} // all dependencies satisfied
2. Select up to N tasks from Ready_Set (N = max parallel engineers above)
3. Apply prioritization heuristics
4. Delegate to available engineers
5. Wait for completion signals
6. Update dependency state after each successful integration
Task assignment JSON format (structured communication — NO free-form dialog):
{
"task_id": "string",
"task_description": "string",
"target_files": ["path/to/file.py"],
"target_functions": ["function_name"],
"dependencies": ["task_id_1", "task_id_2"],
"expected_outcome": "description of success criteria",
"verification_command": "pytest tests/test_file.py -v",
"restricted_files": ["src/__init__.py", "src/config.py"],
"priority": "high|medium|low"
}
Key: All communication uses structured JSON, not free-form dialog. This prevents inter-agent misalignment (primary failure mode in multi-agent systems).
Event loop pattern:
Engineer self-verification (MANDATORY before submission):
Merge workflow:
# Manager attempts merge
git checkout main
git merge <engineer-branch>
# If conflict:
# 1. Engineer who produced conflicting commit is RESPONSIBLE for resolution
# 2. Engineer pulls latest main: git pull origin main
# 3. Resolves conflicts locally
# 4. Re-runs tests to ensure resolution didn't break anything
# 5. Resubmits commit
# 6. Manager retries merge
Main branch is single source of truth throughout execution.
To prevent context explosion, manager uses LLMSummarizingCondenser pattern:
Periodically:
1. Summarize prior interaction rounds
2. Preserve structured artifacts:
- Dependency graph (current state)
- Completed tasks (with commit hashes)
- Unresolved errors (with traceback summaries)
3. Discard detailed conversation history
4. Maintain execution traceability without bloat
Compressed execution history format:
{
"round": 5,
"completed": ["task-1", "task-2", "task-3"],
"ready": ["task-4", "task-5"],
"blocked": ["task-6: waiting for task-5"],
"active_engineers": 2,
"main_branch_commits": ["abc123", "def456"],
"unresolved_errors": []
}
State synchronization when main advances:
# Engineer syncs to latest integrated state
cd ../workspace-engineer-1
git fetch origin
git reset --hard origin/main # Sync worktree to latest main
Worktree cleanup (after completion or limit reached):
# Remove worktree when engineer finishes or hits iteration limit
git worktree remove ../workspace-engineer-1
rm -rf ../workspace-engineer-1 # Clean up directory
Worktrees are deleted after all assigned tasks are completed or when the engineer reaches the predefined iteration limit.
Manager iteration limits (from paper):
max_iterations=50max_iterations=80After the asynchronous loop completes, the manager does a final review before submitting the final product.
Final review checklist:
pytest tests/ -v# Manager final verification
git checkout main
pytest tests/ -v # Full test suite
python -m mypackage --version # Smoke test
# Review any integration gaps
For OpenClaw, the sessions_spawn tool enables parallel agent execution:
Spawn engineer agents:
// For each task in Ready_Set, spawn an engineer
{
"runtime": "subagent",
"task": "<task specification with context>",
"agentId": "<engineer-agent-id>",
"mode": "run",
"runTimeoutSeconds": 300
}
Check progress:
// Poll for completion
{
"action": "list"
}
When main advances, update worktrees:
# Engineer pulls latest main before continuing
cd ../workspace-engineer-1
git fetch origin
git reset --hard origin/main # Or rebase
This ensures engineers work from latest integrated state.
From paper analysis (Section 4.4):
| Strategy | Pass Rate | Runtime | When to Use |
|---|---|---|---|
| Round-Manager Review | 60.2% | 3689s | Maximum correctness required |
| Engineer Self-Verification | 55.1% | 2244s | Default - balanced |
| Efficiency-Prioritized | 54.0% | 1909s | Time-critical, acceptable risk |
Default: Engineer self-verification without repeated manager review.
| Pitfall | Solution |
|---|---|
| Using CAID as fallback after single-agent fails | Use from outset; sequential costs ~2x with minimal gain |
| Soft isolation (instruction-only) | Mandatory git worktree physical isolation |
| Too many engineers (>4-8) | Strict limits: 2 PaperBench, 4 Commit0, 8 absolute max |
| Skipping manager pre-setup | Always prepare runtime/stubs/entry points first |
| Skipping manager final review | Always do final verification before submission |
| Merge conflicts from concurrent edits | Group dependent files; engineer resolves own conflicts |
| Not cleaning up worktrees | Delete worktrees after completion/limit reached |
| Agents develop inconsistent views | Structured JSON only; no free-form dialog |
| Silent interference between agents | Explicit merge with test verification |
| Tasks not clearly defined | Build dependency graph before ANY delegation |
| Integration failures discovered late | Self-verification mandatory before commit |
| Context explosion | Use LLMSummarizingCondenser pattern |
| Missing restricted files | Mark __init__.py, configs as restricted |
CAID trade-offs (vs single-agent):
When worth it: Long-horizon shared-artifact tasks where correctness matters more than speed.
See references/examples.md for concrete implementation examples including: