Install
openclaw skills install agent-guruUse when building, designing, or reviewing a multi-agent system for production — routing agents, orchestrating subagents, guarding tools with permissions, managing memory and context windows, adding observability and cost tracking, handling errors, or setting up session persistence.
openclaw skills install agent-guruThe LLM is the reasoning engine. Your code is the execution engine. The loop is the contract between them.
Every production concern — safety, cost, retries, logging, permissions — lives in the harness, not the prompt. A prompt that says "be careful with deletions" is a suggestion. A GuardedToolNode that intercepts delete_* calls is a guarantee.
INGRESS (HTTP / CLI / Webhook / Schedule)
│
ROUTER LAYER — classify intent, dispatch cheaply
│
ORCHESTRATOR — decompose tasks, delegate to specialists
├── Agent A (scoped tools)
└── Agent B (scoped tools)
│
TOOL LAYER — validate schema → check permission → execute → truncate
│
CROSS-CUTTING CONCERNS
├── MEMORY (short-term / working / long-term)
├── OBSERVABILITY (traces, cost, session replay)
└── RESILIENCE (retry, circuit breaker, loop guard)
│
PERSISTENCE — checkpoints (Redis / Postgres) + audit log
Task scoped to ONE domain?
YES → Single ReAct agent with appropriate tools
NO → Independent subtasks?
YES → Parallel multi-agent (supervisor + specialists)
NO → Sequential / hierarchical orchestrator
│
Any irreversible step requiring human review?
YES → Plan-then-execute with HITL interrupt
NO → Orchestrator with auto-delegation
Rule: Start with a single agent. Add multi-agent complexity only when you hit a concrete limit — context window size, tool set sprawl, latency, or accuracy.
| Need | Use |
|---|---|
| Complex branching, HITL, durable persistence, fine-grained control | LangGraph |
| Simple loop, minimal boilerplate, rapid prototype, leaf agents | Strands |
| Orchestration graph + simple leaf agents | LangGraph + Strands hybrid |
Load these on demand using the triggers listed below. Do not load all of them upfront.
| File | Load when... |
|---|---|
| references/router-layer.md | Designing intent routing, building a classifier node, handling misrouting |
| references/orchestrator-layer.md | Decomposing tasks, spawning subagents, implementing plan-then-execute |
| references/tool-safety-layer.md | Designing tools, adding permission rules, implementing HITL or killswitch |
| references/memory-layer.md | Context window approaching limit, adding long-term memory, injecting project context |
| references/observability-layer.md | Adding tracing, tracking token cost, debugging agent behavior, setting up alerts |
| references/resilience-layer.md | Adding retry logic, circuit breakers, preventing infinite loops |
| references/persistence-layer.md | Choosing a checkpointer, implementing session resume, session branching |
| references/production-checklist.md | Before deploying to production — full ~40-point readiness checklist |
| Pattern | Key implementation | Reference |
|---|---|---|
| Intent routing | conditional_edges + confidence threshold | router-layer.md |
| Scoped subagents | create_react_agent with tool subset | orchestrator-layer.md |
| Plan-then-execute | Two nodes, read-only tools in plan phase | orchestrator-layer.md |
| Tool schema | args_schema=PydanticModel on @tool | tool-safety-layer.md |
| Permission guard | GuardedToolNode with PermissionRule list | tool-safety-layer.md |
| HITL interrupt | interrupt() + Command(resume=...) | tool-safety-layer.md |
| Runtime concurrency | is_concurrency_safe(input) per tool call | tool-safety-layer.md |
| Abort hierarchy | Query-level abort + sibling-level child abort | tool-safety-layer.md |
| Tiered compaction | budget → snip → microcompact → autocompact | memory-layer.md |
| Auto-compaction | Summarization node at 80% context | memory-layer.md |
| Context injection | AGENT.md loaded into system prompt | memory-layer.md |
| Full trace | BaseCallbackHandler + structured events | observability-layer.md |
| Cost tracking | Per-turn token accounting in callback | observability-layer.md |
| Config snapshot | Freeze all feature flags at query entry | observability-layer.md |
| Diminishing returns | Track token deltas; stop if delta < 500 × 2 | resilience-layer.md |
| Output limit escalation | Escalate to 64k tokens before compaction | resilience-layer.md |
| Streaming cleanup | Tombstone partial messages on fallback | resilience-layer.md |
| Error-as-observation | try/except → ToolMessage | resilience-layer.md |
| Circuit breaker | State machine wrapping tool fn | resilience-layer.md |
| Session resume | Checkpointer + stable thread_id | persistence-layer.md |
InMemorySaver is for development only. Use Redis or Postgres checkpointers in production.interrupt() pauses the graph. Resume it by calling graph.invoke(Command(resume=...), config=config) — forgetting this leaves the agent stuck.max_iterations. Without a loop guard, a miscalibrated agent runs indefinitely and incurs unbounded cost.