{"skill":{"slug":"precisionledger-agent-orchestration","displayName":"Agent Orchestration Patterns","summary":"Multi-agent orchestration patterns for production deployments. Covers sub-agent QC workflow, model staggering across 5+ models, cross-validation patterns, fa...","description":"---\nname: agent-orchestration\ndescription: 'Multi-agent orchestration patterns for production deployments. Covers sub-agent QC workflow, model staggering across 5+ models, cross-validation patterns, fallback chains, task routing by model strength, ACPX configuration, and cost optimization. Use when coordinating multiple agents or models for complex workflows. Do NOT use for single-agent prompting, prompt engineering, or fine-tuning — those are separate skills.'\nlicense: MIT\nmetadata:\n  openclaw:\n    emoji: '🎭'\n---\n\n# Agent Orchestration\n\nProduction-tested patterns for coordinating multiple AI agents and models. This skill covers the full spectrum from simple fallback chains to complex multi-model workflows with cross-validation and quality control loops.\n\n## When to Use\n\n- Coordinating 2+ agents or models on a single workflow\n- Building QC loops where one model checks another's work\n- Routing tasks to the right model based on task type\n- Setting up fallback chains for reliability\n- Optimizing cost across subscription and API models\n- Configuring ACPX (Agent Computer Protocol eXtended) for Claude Code and Codex\n- Designing spawn patterns for runtime sub-agents\n\n## When NOT to Use\n\n- Single-agent prompting or prompt engineering (use a prompt-engineering skill)\n- Fine-tuning or training models (different domain entirely)\n- Simple API calls to one model (just call the API)\n- RAG or retrieval pipeline design (use a RAG-specific skill)\n- Agent memory architecture (use the agent-memory-architecture skill)\n\n---\n\n## 1. Sub-Agent QC Workflow\n\nThe core pattern: **Produce → Review → Cross-Check → Incorporate → Deliver**.\n\n### The Five-Step Loop\n\n```\n┌─────────────┐\n│  1. PRODUCE  │  Sonnet 4.6 generates first draft\n│  (Grinder)   │  Fast, cost-effective, good enough for 80% of tasks\n└──────┬──────┘\n       ▼\n┌─────────────┐\n│  2. REVIEW   │  Same model self-reviews against criteria\n│  (Self-QC)   │  Catches obvious errors, formatting issues\n└──────┬──────┘\n       ▼\n┌─────────────┐\n│  3. CROSS    │  Different model (GPT-4o / Grok) validates\n│  CHECK       │  Catches blind spots, model-specific biases\n└──────┬──────┘\n       ▼\n┌─────────────┐\n│  4. INCORP.  │  Opus 4.6 synthesizes feedback\n│  (Orchestr.) │  Resolves conflicts, applies judgment\n└──────┬──────┘\n       ▼\n┌─────────────┐\n│  5. DELIVER  │  Final output with confidence score\n│  (Output)    │  Includes provenance trail\n└─────────────┘\n```\n\n### Implementation Example\n\n```python\nasync def qc_workflow(task: str, context: dict) -> dict:\n    \"\"\"Five-step QC workflow with cross-model validation.\"\"\"\n\n    # Step 1: Produce (Sonnet — fast, cheap)\n    draft = await call_model(\n        model=\"claude-sonnet-4-6\",\n        prompt=f\"Complete this task:\\n{task}\",\n        context=context,\n        max_tokens=4096\n    )\n\n    # Step 2: Self-review (same model, different prompt)\n    self_review = await call_model(\n        model=\"claude-sonnet-4-6\",\n        prompt=f\"\"\"Review this output for errors, omissions, and quality:\n\nTASK: {task}\nOUTPUT: {draft}\n\nScore 1-10 on: accuracy, completeness, clarity.\nList specific issues to fix.\"\"\",\n        max_tokens=1024\n    )\n\n    # Step 3: Cross-check (different model family)\n    cross_check = await call_model(\n        model=\"gpt-4o\",\n        prompt=f\"\"\"Independent review. Do NOT assume the draft is correct.\n\nTASK: {task}\nDRAFT: {draft}\nSELF-REVIEW: {self_review}\n\nIdentify: factual errors, logical gaps, missing context, biases.\"\"\",\n        max_tokens=1024\n    )\n\n    # Step 4: Incorporate (Opus — best judgment)\n    final = await call_model(\n        model=\"claude-opus-4-6\",\n        prompt=f\"\"\"Synthesize and produce final output.\n\nTASK: {task}\nDRAFT: {draft}\nSELF-REVIEW: {self_review}\nCROSS-CHECK: {cross_check}\n\nResolve any conflicts. Produce the best possible final output.\nInclude a confidence score (0-100) and list any unresolved concerns.\"\"\",\n        max_tokens=4096\n    )\n\n    # Step 5: Deliver with metadata\n    return {\n        \"output\": final,\n        \"provenance\": {\n            \"producer\": \"claude-sonnet-4-6\",\n            \"reviewer\": \"claude-sonnet-4-6\",\n            \"cross_checker\": \"gpt-4o\",\n            \"synthesizer\": \"claude-opus-4-6\",\n            \"steps_completed\": 5\n        }\n    }\n```\n\n### When to Skip Steps\n\n| Scenario | Skip | Rationale |\n|----------|------|-----------|\n| Low-stakes internal task | Steps 3-4 | Self-review is sufficient |\n| Time-critical (<30s budget) | Steps 2-4 | Single model, accept risk |\n| High-stakes client deliverable | None | Full loop, every time |\n| Coding task with tests | Step 3 | Tests serve as cross-check |\n| Creative/subjective work | Step 3 | Cross-check adds noise, not signal |\n\n---\n\n## 2. Model Staggering\n\nAssign models to tasks based on their demonstrated strengths.\n\n### The Model Roster\n\n```\nModel              Strength Zone              Cost Tier    Speed\n────────────────────────────────────────────────────────────────\nOpus 4.6           Strategy, synthesis,       $$$$$        Slow\n                   complex reasoning,\n                   judgment calls\n\nSonnet 4.6         Production work, coding,   $$$          Fast\n                   analysis, writing,\n                   general-purpose grinder\n\nGPT-4o             Coding, scoring rubrics,   $$$$         Medium\n                   structured output,\n                   alternative perspective\n\nGrok               X/Twitter analysis,        $$           Fast\n                   social media content,\n                   real-time commentary\n\nGemini 2.5 Pro     Deep research, long        $$$          Medium\n                   context analysis,\n                   multimodal processing\n\nHaiku 4.5          Classification, routing,   $            Very Fast\n                   simple extraction,\n                   high-volume tasks\n```\n\n### Task Routing Rules\n\n```yaml\nrouting_rules:\n  # Strategic / High-judgment tasks → Opus\n  strategy:\n    models: [claude-opus-4-6]\n    triggers:\n      - \"requires judgment between competing priorities\"\n      - \"synthesize conflicting information\"\n      - \"make a recommendation with tradeoffs\"\n      - \"review and improve another agent's work\"\n\n  # Production work → Sonnet\n  production:\n    models: [claude-sonnet-4-6]\n    triggers:\n      - \"write code to specification\"\n      - \"generate content from template\"\n      - \"analyze data and report findings\"\n      - \"standard business communication\"\n\n  # Coding with scoring → GPT\n  coding_and_scoring:\n    models: [gpt-4o]\n    triggers:\n      - \"write and debug complex algorithms\"\n      - \"score outputs against rubric\"\n      - \"generate structured JSON/YAML\"\n      - \"cross-validate another model's output\"\n\n  # Social / real-time → Grok\n  social:\n    models: [grok-3]\n    triggers:\n      - \"analyze X/Twitter trends\"\n      - \"generate social media content\"\n      - \"real-time event commentary\"\n      - \"meme-aware communication\"\n\n  # Deep research → Gemini\n  research:\n    models: [gemini-2.5-pro]\n    triggers:\n      - \"analyze documents >100K tokens\"\n      - \"cross-reference multiple long sources\"\n      - \"multimodal analysis (images + text)\"\n      - \"broad research synthesis\"\n\n  # High-volume classification → Haiku\n  classification:\n    models: [claude-haiku-4-5]\n    triggers:\n      - \"classify items into categories\"\n      - \"extract structured fields from text\"\n      - \"route incoming requests\"\n      - \"simple yes/no decisions\"\n```\n\n### Staggering in Practice\n\n```\nExample: \"Write a market analysis report\"\n\n1. Gemini 2.5 Pro  → Research phase (long context, web search)\n2. Sonnet 4.6      → Draft the report (fast production)\n3. GPT-4o          → Score against quality rubric (structured eval)\n4. Opus 4.6        → Final synthesis and executive summary (judgment)\n5. Haiku 4.5       → Extract key metrics into structured JSON (cheap, fast)\n```\n\n---\n\n## 3. Fallback Chains\n\nWhen a model is unavailable, rate-limited, or returns low-quality output, fall through to the next option.\n\n### Chain Configuration\n\n```yaml\nfallback_chains:\n  # Primary reasoning chain\n  reasoning:\n    - model: claude-opus-4-6\n      timeout: 60s\n      retry: 1\n    - model: gpt-4o\n      timeout: 45s\n      retry: 1\n    - model: claude-sonnet-4-6\n      timeout: 30s\n      retry: 2\n    - model: gemini-2.5-pro\n      timeout: 45s\n      retry: 1\n\n  # Fast production chain\n  production:\n    - model: claude-sonnet-4-6\n      timeout: 30s\n      retry: 2\n    - model: gpt-4o\n      timeout: 30s\n      retry: 1\n    - model: grok-3\n      timeout: 20s\n      retry: 1\n\n  # Classification chain (optimize for cost)\n  classification:\n    - model: claude-haiku-4-5\n      timeout: 10s\n      retry: 3\n    - model: claude-sonnet-4-6\n      timeout: 15s\n      retry: 1\n```\n\n### Fallback Decision Logic\n\n```python\nasync def call_with_fallback(chain: str, prompt: str) -> dict:\n    \"\"\"Try models in order until one succeeds with acceptable quality.\"\"\"\n\n    for entry in CHAINS[chain]:\n        for attempt in range(entry[\"retry\"] + 1):\n            try:\n                result = await call_model(\n                    model=entry[\"model\"],\n                    prompt=prompt,\n                    timeout=entry[\"timeout\"]\n                )\n\n                # Quality gate: reject low-confidence outputs\n                if result.get(\"confidence\", 100) < 30:\n                    log(f\"{entry['model']} returned low confidence, trying next\")\n                    break  # Move to next model, don't retry\n\n                return {\n                    \"output\": result,\n                    \"model_used\": entry[\"model\"],\n                    \"attempt\": attempt + 1,\n                    \"fallback_depth\": CHAINS[chain].index(entry)\n                }\n\n            except (TimeoutError, RateLimitError) as e:\n                log(f\"{entry['model']} attempt {attempt+1} failed: {e}\")\n                continue\n\n    raise AllModelsFailed(f\"No model in chain '{chain}' produced acceptable output\")\n```\n\n---\n\n## 4. ACPX Configuration\n\nACPX (Agent Computer Protocol eXtended) enables tool-using agents to coordinate. Configuration for Claude Code and Codex environments.\n\n### Claude Code Configuration\n\nIn your project's `CLAUDE.md`:\n\n```markdown\n# Agent Orchestration\n\n## Sub-agent Spawning\nWhen a task requires cross-model validation:\n1. Use the Agent tool to spawn a sub-agent for the secondary task\n2. The sub-agent inherits the project context but gets its own conversation\n3. Results flow back to the orchestrator via the Agent tool response\n\n## Model Selection\n- Use claude-opus-4-6 for: architectural decisions, code review, complex debugging\n- Use claude-sonnet-4-6 for: implementation, test writing, documentation\n- Use claude-haiku-4-5 for: linting, formatting, simple refactors\n\n## Tool Permissions\nSub-agents may: read files, search code, run tests\nSub-agents may NOT: push to git, modify CI/CD, delete files without confirmation\n```\n\n### ACP Server Setup\n\n```json\n{\n  \"mcpServers\": {\n    \"orchestrator\": {\n      \"command\": \"node\",\n      \"args\": [\"./orchestrator-server.js\"],\n      \"env\": {\n        \"ANTHROPIC_API_KEY\": \"${ANTHROPIC_API_KEY}\",\n        \"OPENAI_API_KEY\": \"${OPENAI_API_KEY}\",\n        \"MAX_CONCURRENT_AGENTS\": \"5\",\n        \"DEFAULT_CHAIN\": \"production\"\n      }\n    }\n  }\n}\n```\n\n### Codex Integration\n\n```yaml\n# codex.yaml\nagents:\n  orchestrator:\n    model: claude-opus-4-6\n    role: \"Route tasks and synthesize results\"\n    tools: [spawn_agent, review_output, merge_results]\n\n  grinder:\n    model: claude-sonnet-4-6\n    role: \"Execute implementation tasks\"\n    tools: [read_file, write_file, run_tests, search_code]\n\n  validator:\n    model: gpt-4o\n    role: \"Cross-validate outputs\"\n    tools: [read_file, run_tests, score_output]\n```\n\n---\n\n## 5. Cost Optimization\n\n### Subscription vs API Economics\n\n```\nSubscription Models ($20-200/month flat):\n  Claude Pro/Max    → Best for: daily interactive use, long sessions\n  ChatGPT Plus      → Best for: GPT-4o access, plugins\n  Grok Premium      → Best for: X integration, real-time\n  Gemini Advanced   → Best for: Google ecosystem, long context\n\nAPI Models (per-token):\n  claude-opus-4-6   → $15/M input, $75/M output\n  claude-sonnet-4-6 → $3/M input, $15/M output\n  claude-haiku-4-5  → $0.80/M input, $4/M output\n  gpt-4o            → $2.50/M input, $10/M output\n```\n\n### $0 Marginal Cost Routing\n\nWhen you have active subscriptions, route interactive and exploratory work through subscriptions (zero marginal cost) and reserve API for automated/batch workflows.\n\n```\nDecision Tree:\n  Is this interactive/exploratory?\n    YES → Route through subscription (Claude Code, ChatGPT, etc.)\n    NO  → Is this batch/automated?\n      YES → Use API with cheapest adequate model\n      NO  → Is this high-volume (>1000 calls/day)?\n        YES → Use Haiku via API ($0.80/M input)\n        NO  → Use Sonnet via API ($3/M input)\n```\n\n### Cost Tracking Template\n\n```\nMonthly AI Spend:\n  Subscriptions (fixed):\n    Claude Max            $200.00\n    ChatGPT Plus           $20.00\n    Grok Premium           $30.00\n    Gemini Advanced        $20.00\n  Subtotal Fixed          $270.00\n\n  API Usage (variable):\n    Opus 4.6         42K tokens    $3.78\n    Sonnet 4.6      380K tokens    $6.84\n    Haiku 4.5     1.2M tokens      $1.76\n    GPT-4o          95K tokens     $1.19\n  Subtotal Variable                $13.57\n\n  Total                           $283.57\n  Cost per task (avg)               $0.28\n  Tasks completed                  1,013\n```\n\n---\n\n## 6. Spawn Patterns\n\n### Pattern 1: Runtime Sub-Agent (Within Claude Code)\n\nUse the `Agent` tool to spawn sub-agents that inherit project context.\n\n```\nOrchestrator (Opus)\n  ├── Agent: \"Research the API surface\" (Explore subagent)\n  ├── Agent: \"Implement the endpoint\" (general-purpose subagent)\n  └── Agent: \"Write tests\" (general-purpose subagent)\n```\n\nBest for: tasks where sub-agents need file system access and project context.\n\n### Pattern 2: API-Spawned Agent (External)\n\nCall model APIs directly for tasks that don't need project context.\n\n```python\n# Spawn multiple validators in parallel\nimport asyncio\n\nasync def parallel_validate(content: str) -> list:\n    tasks = [\n        call_model(\"claude-sonnet-4-6\", f\"Review for accuracy:\\n{content}\"),\n        call_model(\"gpt-4o\", f\"Review for accuracy:\\n{content}\"),\n        call_model(\"gemini-2.5-pro\", f\"Review for accuracy:\\n{content}\"),\n    ]\n    return await asyncio.gather(*tasks)\n```\n\nBest for: cross-validation, scoring, classification — tasks that are self-contained.\n\n### Pattern 3: Orchestrator-Grinder Split\n\nThe orchestrator plans and delegates. Grinders execute. Never let a grinder make strategic decisions.\n\n```\nORCHESTRATOR (Opus 4.6):\n  - Reads the task requirements\n  - Breaks into subtasks\n  - Assigns each subtask to appropriate grinder\n  - Reviews grinder outputs\n  - Synthesizes final deliverable\n  - Makes judgment calls on conflicts\n\nGRINDER (Sonnet 4.6 / GPT-4o):\n  - Receives specific, scoped subtask\n  - Executes without strategic decisions\n  - Returns output with confidence score\n  - Flags uncertainty rather than guessing\n```\n\n### Anti-Patterns to Avoid\n\n| Anti-Pattern | Problem | Fix |\n|-------------|---------|-----|\n| Grinder makes strategic calls | Inconsistent decisions, wasted work | Escalate to orchestrator |\n| Orchestrator does grinder work | Slow, expensive, bottleneck | Delegate production tasks |\n| No quality gate between steps | Errors compound through pipeline | Add review step after each stage |\n| Same model reviews its own work | Blind spots persist | Cross-model validation |\n| Spawning agents for trivial tasks | Overhead exceeds task cost | Direct call for simple tasks |\n| Infinite retry loops | Cost explosion | Max 3 retries, then escalate |\n\n---\n\n## 7. Orchestrator vs Grinder Principle\n\nThis is the foundational principle of multi-agent systems.\n\n### The Rule\n\n> **The orchestrator thinks. The grinder does. Never confuse the two.**\n\n### Role Definitions\n\n```\nORCHESTRATOR                          GRINDER\n─────────────────────────────────     ─────────────────────────────────\nDecides WHAT to do                    Decides HOW to do it\nChooses which model/tool              Uses the tools it's given\nReviews and judges quality            Produces and reports confidence\nResolves conflicts between agents     Flags conflicts for resolution\nOwns the final output                 Owns its subtask output\nExpensive, slow, high-judgment        Cheap, fast, high-throughput\n1 per workflow                        N per workflow\n```\n\n### Decision Framework\n\n```\n\"Should this be an orchestrator or grinder decision?\"\n\nAsk: \"If two reasonable people disagreed on this, would it matter?\"\n  YES → Orchestrator decision (judgment required)\n  NO  → Grinder decision (execution, not judgment)\n\nAsk: \"Does this affect the overall workflow direction?\"\n  YES → Orchestrator decision\n  NO  → Grinder decision\n\nAsk: \"Could a junior employee do this with clear instructions?\"\n  YES → Grinder task\n  NO  → Orchestrator task\n```\n\n### Example Workflow: Client Deliverable\n\n```\nORCHESTRATOR (Opus):\n  1. Read client brief → decide deliverable structure\n  2. Break into sections → assign to grinders\n  3. Review all sections → identify gaps\n  4. Resolve quality issues → request rewrites\n  5. Synthesize → produce final deliverable\n  6. Generate executive summary → deliver\n\nGRINDER 1 (Sonnet): Write Section A per outline\nGRINDER 2 (Sonnet): Write Section B per outline\nGRINDER 3 (GPT-4o): Generate data tables and charts\nGRINDER 4 (Gemini): Research background for Section C\nGRINDER 5 (Haiku): Format citations and references\n```\n\nTotal cost: 1 Opus call (synthesis) + 5 cheaper calls (production)\nvs. doing everything in Opus: 6 Opus calls at 5x the cost.\n","topics":["Orchestration","Multi Agent","Prompt Engineering"],"tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":510,"installsAllTime":19,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1773562542511,"updatedAt":1778491921387},"latestVersion":{"version":"1.0.0","createdAt":1773562542511,"changelog":"Initial release — model staggering, sub-agent QC, ACPX config","license":"MIT-0"},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"samledger67-dotcom","userId":"s17e6161jqcyaq82bkvewk1rz583ndyf","displayName":"samledger67-dotcom","image":"https://avatars.githubusercontent.com/u/261800271?v=4"},"moderation":null}