Agent Guru

MCP Tools

Use when building, designing, or reviewing a multi-agent system for production — routing agents, orchestrating subagents, guarding tools with permissions, managing memory and context windows, adding observability and cost tracking, handling errors, or setting up session persistence.

Install

openclaw skills install agent-guru

Production Agent Design

Core Principle

The LLM is the reasoning engine. Your code is the execution engine. The loop is the contract between them.

Every production concern — safety, cost, retries, logging, permissions — lives in the harness, not the prompt. A prompt that says "be careful with deletions" is a suggestion. A GuardedToolNode that intercepts delete_* calls is a guarantee.

When to Use This Skill

  • Designing a new multi-agent system from scratch
  • Adding safety, cost controls, or observability to an existing agent
  • Debugging runaway cost, infinite loops, or context window exhaustion
  • Choosing between single-agent vs multi-agent topology
  • Implementing human-in-the-loop (HITL) for irreversible actions
  • Setting up session persistence and resumption

Architecture at a Glance

INGRESS (HTTP / CLI / Webhook / Schedule)
    │
ROUTER LAYER          — classify intent, dispatch cheaply
    │
ORCHESTRATOR          — decompose tasks, delegate to specialists
    ├── Agent A (scoped tools)
    └── Agent B (scoped tools)
         │
TOOL LAYER            — validate schema → check permission → execute → truncate
         │
CROSS-CUTTING CONCERNS
    ├── MEMORY         (short-term / working / long-term)
    ├── OBSERVABILITY  (traces, cost, session replay)
    └── RESILIENCE     (retry, circuit breaker, loop guard)
         │
PERSISTENCE           — checkpoints (Redis / Postgres) + audit log

Single Agent vs Multi-Agent

Task scoped to ONE domain?
  YES → Single ReAct agent with appropriate tools
  NO  → Independent subtasks?
          YES → Parallel multi-agent (supervisor + specialists)
          NO  → Sequential / hierarchical orchestrator
                  │
              Any irreversible step requiring human review?
                YES → Plan-then-execute with HITL interrupt
                NO  → Orchestrator with auto-delegation

Rule: Start with a single agent. Add multi-agent complexity only when you hit a concrete limit — context window size, tool set sprawl, latency, or accuracy.

Framework Selection

NeedUse
Complex branching, HITL, durable persistence, fine-grained controlLangGraph
Simple loop, minimal boilerplate, rapid prototype, leaf agentsStrands
Orchestration graph + simple leaf agentsLangGraph + Strands hybrid

Reference Files

Load these on demand using the triggers listed below. Do not load all of them upfront.

FileLoad when...
references/router-layer.mdDesigning intent routing, building a classifier node, handling misrouting
references/orchestrator-layer.mdDecomposing tasks, spawning subagents, implementing plan-then-execute
references/tool-safety-layer.mdDesigning tools, adding permission rules, implementing HITL or killswitch
references/memory-layer.mdContext window approaching limit, adding long-term memory, injecting project context
references/observability-layer.mdAdding tracing, tracking token cost, debugging agent behavior, setting up alerts
references/resilience-layer.mdAdding retry logic, circuit breakers, preventing infinite loops
references/persistence-layer.mdChoosing a checkpointer, implementing session resume, session branching
references/production-checklist.mdBefore deploying to production — full ~40-point readiness checklist

Quick Reference

PatternKey implementationReference
Intent routingconditional_edges + confidence thresholdrouter-layer.md
Scoped subagentscreate_react_agent with tool subsetorchestrator-layer.md
Plan-then-executeTwo nodes, read-only tools in plan phaseorchestrator-layer.md
Tool schemaargs_schema=PydanticModel on @tooltool-safety-layer.md
Permission guardGuardedToolNode with PermissionRule listtool-safety-layer.md
HITL interruptinterrupt() + Command(resume=...)tool-safety-layer.md
Runtime concurrencyis_concurrency_safe(input) per tool calltool-safety-layer.md
Abort hierarchyQuery-level abort + sibling-level child aborttool-safety-layer.md
Tiered compactionbudget → snip → microcompact → autocompactmemory-layer.md
Auto-compactionSummarization node at 80% contextmemory-layer.md
Context injectionAGENT.md loaded into system promptmemory-layer.md
Full traceBaseCallbackHandler + structured eventsobservability-layer.md
Cost trackingPer-turn token accounting in callbackobservability-layer.md
Config snapshotFreeze all feature flags at query entryobservability-layer.md
Diminishing returnsTrack token deltas; stop if delta < 500 × 2resilience-layer.md
Output limit escalationEscalate to 64k tokens before compactionresilience-layer.md
Streaming cleanupTombstone partial messages on fallbackresilience-layer.md
Error-as-observationtry/exceptToolMessageresilience-layer.md
Circuit breakerState machine wrapping tool fnresilience-layer.md
Session resumeCheckpointer + stable thread_idpersistence-layer.md

Gotchas

  • Safety rules must be code, not prompts. A prompt saying "don't delete production data" is not a safety control.
  • Never dump the full parent message history into a subagent. Pass only the specific task and relevant data — context pollution degrades performance and wastes tokens.
  • InMemorySaver is for development only. Use Redis or Postgres checkpointers in production.
  • interrupt() pauses the graph. Resume it by calling graph.invoke(Command(resume=...), config=config) — forgetting this leaves the agent stuck.
  • Tool result truncation is mandatory. Large tool outputs (file reads, search results) will exhaust the context window if not truncated before returning.
  • Always set max_iterations. Without a loop guard, a miscalibrated agent runs indefinitely and incurs unbounded cost.
  • Apply compaction in tiers. Budget tool results → snip → microcompact → autocompact. Jumping straight to full summarization wastes tokens when a cheaper step would suffice.
  • Track diminishing returns, not just token budget. An agent can burn through its iteration budget producing nearly empty continuations. Stop when the last 2 deltas are both below ~500 tokens.
  • Snapshot config at query entry. Never re-read feature flags or env vars mid-turn — a remote config change during a 30-second response causes inconsistent behavior within a single turn.
  • Concurrency safety must be checked at runtime. Schema metadata cannot determine if a bash command is safe — inspect the actual input string at call time. Fail conservatively (serial) if parsing fails.