Install
openclaw skills install subagent-architectureAdvanced patterns for specialized subagent orchestration with production-ready reference implementations. Security isolation, phased implementation, peer col...
openclaw skills install subagent-architecturePatterns and templates for building robust multi-agent systems with OpenClaw.
This is one of the most feature-dense skills in the ClawHub registry. Security scanners will flag it — not because it's malicious, but because it does a lot. Here's exactly what's in it and why:
Scope:
setup.sh that creates local directory scaffolding (no network calls, no package installs)spawn-security-proxy.js (test fixtures, not live payloads)Why the libs exist: These aren't glue code — they implement real patterns: output sanitization with canary tokens, multi-source research validation, cost projection with approval gates, and subagent output scoring. The complexity is the point; simpler skills don't solve these problems.
Why scanners flag it:
require() itComplexity is not a red flag here. It's the product.
What this skill contains:
lib/ — Reference JS libraries (spawn helpers, cost estimator, quality scorer). These are not auto-executed. Copy them to your workspace lib/ directory to use them.templates/ — Markdown spawn templates for common patterns.setup.sh — Creates local directory scaffolding only. Makes no network calls, installs no packages.External integrations: All optional, none auto-configured.
Credential requirements: None. No API keys, tokens, or env vars are required or auto-read by this skill.
This skill provides battle-tested patterns for:
Advanced Patterns:
Templates:
templates/security-proxy.md - Isolate untrusted service accesstemplates/researcher-specialist.md - Domain-specific research agentstemplates/phased-implementation.md - Multi-phase feature developmenttemplates/peer-review-specialist.md - External peer validationIntegration:
cd $OPENCLAW_WORKSPACE/skills/subagent-architecture
bash setup.sh # Creates directories and scaffolding
For high-risk operations:
# Read security-proxy template
cat templates/security-proxy.md
# Spawn isolated proxy for untrusted API
# (see template for full example)
For research tasks:
# Read researcher-specialist template
cat templates/researcher-specialist.md
# Spawn domain expert for deep analysis
# (see template for multi-perspective pattern)
For complex features:
# Read phased-implementation template
cat templates/phased-implementation.md
# Launch architect → coder → reviewer pipeline
# (see template for orchestration example)
For external validation:
# Read peer-review-specialist template
cat templates/peer-review-specialist.md
# Request peer agent review via Discord/API
# (see template for federated trust protocol)
Critical clarification for new users: AGENTS.md is a human-readable reference document, not a configuration file. The spawning system does not read it.
sessions_spawn is a tool call — it takes parameters you provide at the moment of the call. It does not read from any file, config, or registry. You can spawn subagents on a completely fresh OpenClaw install with zero files in your workspace.
// This is ALL the system needs — no AGENTS.md required
sessions_spawn({
label: "my-researcher",
task: "Research the topic X",
model: "sonnet"
})
AGENTS.md (and domain files like AGENTS_WRITING.md, AGENTS_INFRA.md) are memory aids for you (the agent). They store:
You can split AGENTS.md into 50 domain files or delete it entirely — spawning still works. The split is purely for your readability; it has zero functional effect.
When documentation says "read AGENTS_WRITING.md before spawning AuthorAgent", it means:
Read it so you know what personality, model, and task description to use when building the spawn call.
The system does not read it. You read it, extract the configuration, then make the spawn call with those parameters.
When you install this skill on a fresh OpenClaw setup:
See templates/agents-registry-template.md for a minimal starter template when you're ready.
NEW in v2.1: Production-ready code libraries for all patterns.
lib/spawn-security-proxy.js - Security isolation framework
spawnSecurityProxy(config) - Spawn isolated proxy with sanitizationdeepSanitize(data) - Remove sensitive data (API keys, paths, emails)validateSchema(data, schema) - JSON schema validationcreateDefaultSchema(type) - Common output schemas (list, single, status)lib/spawn-researcher.js - Multi-perspective research framework
spawnResearcher(config) - Spawn domain expert researcherspawnMultiPerspective(config) - Multi-perspective research (optimist/pessimist/pragmatist)assessSourceCredibility(source) - Score source trustworthiness (0-100)lib/cost-estimator.js - Cost estimation and tracking
estimateSubagentCost(params) - Pre-spawn cost estimation with confidence intervalslogSubagentCost(label, estimate, actual) - Log for accuracy trackingrecalibrateEstimator() - Monthly accuracy improvementgetPatternHistory(pattern) - Historical performance analysisgetCostTier(cost) - Classify as micro/small/medium/largelib/quality-scorer.js - Output quality assessment
scoreSubagentOutput(output, rubric) - 8-dimension quality scoringcreateScoringTemplate() - Manual review templateselfAuditChecklist(output) - Pre-delivery validationAll libraries include complete working examples:
# Security proxy examples
node examples/security-proxy-usage.js
# Researcher examples
node examples/researcher-usage.js
# Cost estimation examples
node examples/cost-estimation-demo.js
Path Resolution Note: The
require()paths in examples below assume you call them from your workspace root. If you call from within the skill directory (e.g., inside an example script), use__dirnameinstead. For portable code that works regardless of cwd:// Path resolution — works regardless of workspace structure const path = require('path'); const SKILL_DIR = __dirname; // when called from within skill directory // OR if calling from workspace root: const SKILL_DIR = path.join(process.env.OPENCLAW_WORKSPACE || process.cwd(), 'skills', 'subagent-architecture'); const { spawnSecurityProxy } = require(path.join(SKILL_DIR, 'lib', 'spawn-security-proxy'));
// Example: Spawn security proxy for untrusted API
const { spawnSecurityProxy } = require('./skills/subagent-architecture/lib/spawn-security-proxy');
const result = await spawnSecurityProxy({
service: 'weather-api',
task: 'Get current weather for New York',
query: { city: 'New York', units: 'metric' },
output_schema: {
type: 'object',
properties: {
temperature: { type: 'number' },
conditions: { type: 'string' }
}
},
spawn_fn: async (config) => {
// Your actual sessions_spawn call here
return await sessions_spawn(config);
}
});
// Example: Estimate cost before spawning
const { estimateSubagentCost, logSubagentCost } = require('./skills/subagent-architecture/lib/cost-estimator');
const estimate = estimateSubagentCost({
task_complexity: 'medium',
expected_duration_min: 15,
model: 'sonnet',
research_required: true
});
console.log(`Estimated: $${estimate.expected} (range: $${estimate.min}-$${estimate.max})`);
// After spawn completes
logSubagentCost('researcher-task', estimate, actual_cost);
// Example: Score output quality
const { scoreSubagentOutput } = require('./skills/subagent-architecture/lib/quality-scorer');
const score = scoreSubagentOutput(subagent_output, null, { auto_score: true });
console.log(`Quality: ${score.overall_score}/10 (${score.pass ? 'PASS' : 'FAIL'})`);
See examples/ directory for complete working demonstrations.
Required:
sessions_spawn API)Optional Skills:
task-routing
skills/task-routing/)skills/task-routing/SKILL.mdcost-governor
lib/cost-estimator.js for manual gatingdrift-guard
All reference implementations are dependency-free (pure Node.js):
Automatic (via task-routing skill):
Manual (call libraries directly):
require() any lib file and call functionsexamples/ for usage patternsspawn_fn for integration with your sessions_spawnFrom v2.0 to v2.1:
Future (v3.0 with framework improvements):
Problem: Subagents with full workspace access can leak sensitive data to untrusted APIs.
Solution: Security proxies receive minimal context, restricted tools, sanitized output.
Example:
Main Agent (full context)
│
└─ SecurityProxy (minimal context)
├─ Query: "Get weather for New York"
├─ Tools: exec (curl only)
├─ Output: Sanitized JSON (no API metadata)
└─ Auto-terminate after single task
Key principles:
See: templates/security-proxy.md
Problem: Generic web search returns surface-level results without domain expertise.
Solution: Specialized researchers with domain bias, multi-source validation, actionable synthesis.
Example:
Question: "Should we adopt technology X?"
├─ OptimistResearcher (best-case analysis)
├─ PessimistResearcher (risk assessment)
└─ PragmatistResearcher (current reality)
Main Agent synthesizes: Balanced decision tree
Key principles:
See: templates/researcher-specialist.md
Problem: Single-agent implementation mixes design, coding, and validation → tech debt.
Solution: Separate architect (design), coder (build), reviewer (validate) phases.
Example:
Feature Request: "Add memory consolidation skill"
Phase 1: SystemArchitect (15min, $0.40)
└─ Delivers: IMPLEMENTATION_PLAN.md
Phase 2: CoderAgent (25min, $0.70)
└─ Delivers: Working code + tests
Phase 3: ReviewerAgent (10min, $0.30)
└─ Delivers: REVIEW_REPORT.md (approval/rejection)
Total: 50min, $1.40, high-quality feature
Key principles:
See: templates/phased-implementation.md
Problem: Your agent may have blindspots or lack domain expertise.
Solution: Request external validation from trusted peer agents.
Example:
Your Agent
│
├─ Prepares sanitized review package
│
├─ Contacts Smith's SecurityBot (via Discord/API)
│ └─ Smith's bot spawns SecurityReviewer
│ └─ Returns: Structured findings
│
└─ Integrates feedback (fix critical issues)
Key principles:
⚠️ Trust tier ≠ security bypass: If you implement tiered trust (Acquaintance → Friend → Ally or equivalent), "higher trust" means relaxed engagement policy — not skipping security validation. Content from even the most trusted peer still goes through injection detection (Stage 4). An ally account can be compromised; the injection scanner is the last defense that can't be socially engineered. "Light validation" always means reduced schema scrutiny, never reduced security scanning.
See: templates/peer-review-specialist.md
Problem: Security proxies are ephemeral (spawn-per-task, terminate). But some external services require a persistent identity — a social network account, an API user, a recurring agent persona. These are in tension.
Solution: Separate the process lifecycle from the external identity. The process dies every spawn. The external identity persists through workspace state + core-managed credentials.
External service sees: CedarProxy (consistent identity across all spawns)
Under the hood:
Spawn 1: [process starts] → reads state files → executes → [process dies]
Spawn 2: [new process] → reads same state files → continues → [dies]
Spawn 3: [new process] → reads same state files → continues → [dies]
State that persists: /proxy-workspace/ files (logs, relationships, posts)
State that doesn't: in-memory session, conversation history, token
Credentials: held by core, passed as short-lived scoped token at spawn time
Key rules:
Anti-pattern: storing tokens in [workspace]/config.json — if the proxy workspace is compromised, credentials should not be compromised with it.
Problem: Single-agent analysis has blind spots — the same agent proposes and evaluates its own ideas.
Solution: Spawn 3 specialist agents in parallel with distinct analytical lenses, then synthesize into one unified response.
Architecture:
High-complexity query
│
├─ Critic lens (find flaws, risks, costs — no solutions)
├─ Implementer lens (concrete execution, architecture, feasibility)
└─ Synthesizer lens (integrate, resolve tensions, unified path)
│
UnifierAgent (one clean final response)
When to use:
When to skip:
Key components:
lib/team-chatroom.js — append-only JSONL shared memory between parallel agentslib/research-coordinator.js — builds task strings for each lens + UnifierAgentshouldUseResearchTeam(complexity, explicitFlag) — gate functionbuildResearchTeam(query) — returns sessionId + 3 specialist task objectsbuildUnifierTask(query, sessionId) — builds synthesis prompt from chatroomcheckLensCompletion(sessionId) — verifies all lenses posted before unifyingLens discipline:
Cost: ~4x single-agent. Gate on complexity to control spend.
Critical rule: Grok or any external model used as a lens is a leaf node — it answers, it never spawns further agents.
See: lib/research-coordinator.js for reference implementation.
Problem: Your agent may have architectural blind spots that another model's reasoning style would catch. Human relay (copy/paste to another AI) is slow and lossy.
Solution: Spawn an ExternalConsultAgent subagent that calls an external model's API directly, persists the session, and returns the response.
Architecture:
Core agent decides consultation needed
│
└─ ExternalConsultAgent subagent
├─ Calls external API (Grok, OpenAI, etc.)
├─ Injects system-level intent for alignment
├─ Supports session continuity (sessionId)
└─ Returns response + sessionId
Why subagent, not core agent: External API calls are side effects with cost and data exposure implications. Isolating them in a subagent follows the blast-shield philosophy — if something goes wrong, it's contained.
Key components:
lib/external-bridge.js — session management, task string builder, consultation loggermemory/external-agents.json — provider config (model IDs, endpoints)memory/external-sessions/ — persistent session files for conversation continuitylib/external-consult-helper.js — shouldAutoConsult() gate (RED drift + high complexity + 24h cooldown)Auto-consult gate (3 conditions, all required):
Security rules (enforce before go-live):
Session continuity:
Pass sessionId from a previous consultation to continue the conversation. The bridge injects prior message history automatically.
See: lib/external-bridge.js for reference implementation.
Problem: Prompt engineering tells agents what to do. Context engineering tells agents what to know. Neither tells agents what to want — so they drift from user values over time without detection.
Solution: A three-layer system that encodes agent intent, extracts intent signals from user behavior, and detects drift before it becomes a problem.
Three layers:
memory/intent-manifest.json) — Machine-readable intent spec per agent: core purpose, hard constraints (with regex), operational goals with verification keywords, user signals extracted from historylib/intent-extractor.js) — Weekly pass over episode history extracting preference/value/correction signals; decays old signals (rate: 0.975/week), prunes below 0.30 strengthlib/intent-drift-detector.js) — 5-component score per episode: hard constraint violations (0.35), goal keyword coverage (0.25), structural drift via Jaccard (0.20), correction pressure (0.15), forbidden phrase hits (0.05)Drift levels:
Intent manifest schema:
{
"system": {
"core_purpose": "...",
"hard_constraints": [{"id": "...", "rule": "...", "regex": "..."}],
"operational_goals": [{"id": "...", "description": "...", "verification_keywords": [], "weight": 0}]
},
"agents": {
"AgentName": {
"inherits": true,
"overrides": {
"core_purpose": "...",
"operational_goals": [],
"user_signals": []
},
"version": "YYYY-MM-DD.001"
}
},
"diff_log": []
}
Key properties:
score *= (1 - drift.score * 0.3)Cold start: Works on day 1 with hand-authored goals. Signals accumulate automatically.
See: lib/intent-manager.js, lib/intent-extractor.js, lib/intent-drift-detector.js for reference implementation.
Problem: Uncontrolled subagent spawning leads to budget overruns.
Solution: Estimate cost before spawning, require approval for expensive operations.
Framework:
// Pre-spawn cost estimation
const estimate = estimateSubagentCost({
task_complexity: "high", // simple/medium/high
expected_duration_min: 20,
model: "sonnet", // haiku/sonnet/opus
research_required: true
})
// estimate = { min: $0.60, max: $1.20, confidence: 0.8 }
if (estimate.max > 0.50) {
// Log to cost tracking
await logCostEstimate("task-label", estimate)
if (estimate.max > 2.00) {
// Require human approval
await requestApproval(estimate)
}
}
// Spawn subagent
const result = await spawnSubagent(...)
// Track actual cost
await logActualCost("task-label", result.cost)
Cost tiers:
Optimization strategies:
subagents/
├── [specialist-name]/
│ ├── SPECIALIST.md # Agent definition and personality
│ ├── knowledge-base/ # Reference materials
│ └── research/ # Task outputs and findings
└── _archived/ # Retired subagents
skills/subagent-architecture/
├── SKILL.md # This file
├── templates/
│ ├── security-proxy.md
│ ├── researcher-specialist.md
│ ├── phased-implementation.md
│ └── peer-review-specialist.md
└── setup.sh # Directory scaffolding script
Use when:
Skip when:
Use when:
Skip when:
Use when:
Skip when:
Use when:
Skip when:
Automatic pattern selection via task-routing skill:
# config/routing-rules.yaml
pattern_routing:
security_proxy:
triggers:
- blast_radius > 70
- untrusted_api: true
max_cost: 0.10
researcher_specialist:
triggers:
- task_type: research
- complexity > 50
min_sources: 3
phased_implementation:
triggers:
- task_type: code_gen
- files_affected > 3
- cost_estimate > 1.00
phases: [architect, coder, reviewer]
peer_review:
triggers:
- irreversibility > 80
- domain_gap: true
require_approval: true
Manual override:
// Force specific pattern
spawnSubagent({
pattern: "security-proxy", // override routing decision
task: "...",
justification: "Experimental API, prefer isolation"
})
All subagent spawns >$0.50 must be logged to notes/cost-tracking.md:
## Subagent Cost Tracking
| Date | Label | Pattern | Estimate | Actual | Delta | Notes |
|------|-------|---------|----------|--------|-------|-------|
| 2026-02-22 | architect-feature-x | phased-impl | $0.40 | $0.38 | -5% | Faster than expected |
| 2026-02-22 | coder-feature-x | phased-impl | $0.70 | $0.85 | +21% | Complex refactor, used opus |
| 2026-02-22 | researcher-market | researcher | $0.60 | $0.55 | -8% | Cached sources helped |
Track estimate vs actual to improve future predictions:
// Calculate rolling accuracy
const accuracy = calculateAccuracy({
window_days: 30,
min_samples: 10
})
// accuracy = { mean_error: 12%, confidence: 0.85 }
// Adjust future estimates
const adjusted_estimate = base_estimate * (1 + accuracy.mean_error)
All specialist subagents should score 7+ on this rubric:
| Dimension | Poor (1-3) | Good (4-6) | Excellent (7-10) |
|---|---|---|---|
| Specificity | Vague generalizations | Some concrete details | Precise, actionable specifics |
| Actionability | No clear next steps | Suggestions provided | Step-by-step implementation plan |
| Evidence | Unsourced claims | Some citations | Every claim sourced, validated |
| Structure | Stream-of-consciousness | Basic organization | Scannable hierarchy, summaries |
| Completeness | Missing key aspects | Most areas covered | Comprehensive, gaps documented |
| Honesty | Hides limitations | Mentions some trade-offs | Explicit about unknowns, risks |
| Cost-awareness | No budget consideration | Rough estimates | Detailed cost/benefit analysis |
| Integration | Ignores existing systems | Basic compatibility | Seamless integration plan |
Before finalizing subagent output:
Permanent subagents (skill-based):
Ephemeral subagents (one-off spawns):
All subagents should:
Task routing:
Cost governor:
Drift guard:
Known gaps identified in peer review (Agent Smith, EasyClaw project)
Current OpenClaw subagent framework has architectural limitations that affect all patterns in this skill. These are framework-level constraints, not skill-specific issues. Documenting for transparency and future improvement.
Smith's Question: "How does core define sub-bot constraints before launch?"
Current State:
task, personality, model, timeout, labelLimitations:
Current Workaround:
v2 Design Considerations:
// Proposed: Granular per-spawn constraints
spawnSubagent({
label: "researcher-x",
task: "Research topic",
constraints: {
max_memory_mb: 512, // Kill if exceeds RAM limit
max_api_calls: 50, // Deny further requests after quota
max_disk_mb: 100, // Block file writes beyond limit
max_cost_usd: 0.50, // Auto-terminate if cost projection exceeds
tools_allowed: ["web_search", "web_fetch"], // Per-spawn tool whitelist
tools_denied: ["exec", "write"], // Explicit blacklist
rate_limits: {
web_search: { calls: 10, window_seconds: 60 } // Max 10 searches/min
}
}
})
Impact on Patterns:
Smith's Question: "How are skills validated before execution?"
Current State:
skill-vetter checks metadata, malicious patternsLimitations:
Current Workaround:
v2 Design Considerations:
// Proposed: Skill capability manifest
// skills/my-skill/SKILL.md
---
capabilities_required:
- web_search # Skill needs web access
- read:config/ # Can read config directory only
- write:output/ # Can write to output directory only
isolation_level: sandbox # Run in isolated context
max_execution_time: 30 # Kill after 30 seconds
---
// Runtime: Skill runs in sandbox with only declared capabilities
// Attempts to use undeclared tools → blocked, logged, skill terminated
Impact on Patterns:
Smith's Question: "What can sub-bots ask for from core?"
Current State:
Limitations:
Current Workaround:
v2 Design Considerations:
// Proposed: Request-response protocol during execution
// Subagent code:
const clarification = await requestFromCore({
type: "clarification",
question: "User said 'recent data' - how many days back?",
options: ["7 days", "30 days", "90 days"],
timeout_seconds: 60 // If no response, default to first option
})
// Core receives request:
// - Auto-approve safe requests (data lookup from memory)
// - Escalate to human for decisions
// - Return answer to subagent
// - Subagent continues with clarification
// Examples:
// - "Need API key for service X" → Core provides from secure store
// - "Found conflicting data, which source to trust?" → Human decides
// - "Task seems too expensive ($5 estimated), proceed?" → Approval workflow
Impact on Patterns:
Smith's Question: "When does core kill a sub-bot?"
Current State:
Limitations:
Current Workaround:
v2 Design Considerations:
// Proposed: Multi-condition termination
spawnSubagent({
label: "researcher-x",
task: "Research topic",
termination: {
// Existing
timeout_minutes: 20,
// New: Resource limits
max_memory_mb: 512, // Kill if RSS > 512MB
max_cost_usd: 1.00, // Kill if projected cost > $1
max_output_tokens: 10000, // Kill if response > 10k tokens
// New: Behavioral detection
stuck_detection: {
enabled: true,
same_action_threshold: 5, // Kill if repeats same tool call 5x
no_progress_minutes: 5 // Kill if no new output for 5min
},
// New: External triggers
kill_signal: "session:parent_terminated" // Kill if parent agent dies
}
})
// Termination reasons logged for analysis:
// - "timeout" (existing)
// - "memory_exceeded" (new)
// - "cost_exceeded" (new)
// - "stuck_loop_detected" (new)
// - "output_overflow" (new)
// - "parent_died" (new)
Impact on Patterns:
Smith's Question: "How do you learn from sub-bot failures?"
Current State:
AGENTS.md after successful spawnsLimitations:
Current Workaround:
v2 Design Considerations:
// Proposed: Systematic post-mortem database
// Storage: memory/subagent-postmortems.jsonl
{
"spawn_id": "researcher-market-20260222-1430",
"label": "researcher-market",
"pattern": "researcher-specialist",
"outcome": "failure", // success | partial | failure
"termination_reason": "cost_exceeded",
"stats": {
"duration_minutes": 18,
"cost_actual": 1.25,
"cost_estimate": 0.60,
"cost_accuracy": -108%, // Overran by 108%
"tokens_used": 45000,
"api_calls": 127
},
"deliverable_quality": null, // Not rated (failed before completion)
"failure_mode": {
"category": "runaway_research", // Taxonomy of failures
"root_cause": "No stuck detection, research loop on tangent",
"human_notes": "Researched sub-topic for 15min, didn't return to main question"
},
"lessons": [
"Add stuck detection for researcher pattern",
"Improve cost estimation for multi-source research (was 2x off)"
]
}
// Analytics queries:
// - Success rate by pattern: researcher-specialist = 73% (8/11 succeeded)
// - Most common failure mode: cost_exceeded (40% of failures)
// - Cost estimation accuracy: researcher pattern = -25% average (underestimates)
// - ROI analysis: phased-impl pattern = $2.50 avg cost, 90% success, high value
Proposed Learning Loop:
// Before spawning, check historical performance
const history = getPatternHistory("researcher-specialist")
if (history.success_rate < 0.7) {
console.warn(`⚠️ Pattern has 65% success rate (13/20). Common failure: ${history.top_failure_mode}`)
// Adjust: Add stuck detection, reduce timeout, use cheaper model
}
if (history.cost_accuracy < -0.3) {
console.warn(`⚠️ Pattern underestimates cost by 35% on average`)
// Adjust: Inflate estimate by 35%
const adjusted_estimate = base_estimate * 1.35
}
// After completion, log outcome
logPostMortem({
spawn_id,
outcome: "success",
quality_score: 8.5, // From rubric
cost_actual: 0.58,
cost_estimate: 0.60,
lessons: ["Worked well, multi-perspective pattern delivered balanced view"]
})
Impact on Patterns:
| Feature | v1 (Current) | v2 (Proposed) | Benefit |
|---|---|---|---|
| Spawn constraints | Timeout only | Memory, cost, quota, tool whitelist | Resource safety |
| Skill isolation | Trust-based | Sandboxed with capabilities | Security |
| Communication | One-way | Bidirectional request/response | Adaptive execution |
| Termination | Time-based | Multi-condition (resource, stuck, cost) | Cost control |
| Post-mortem | Success-only | Full lifecycle tracking + analytics | Continuous learning |
Current mitigation strategies:
v2 would enable:
Note for users: These limitations affect all subagent patterns in this skill. The patterns documented here (security-proxy, researcher, phased-impl, peer-review) work within current framework constraints. v2 improvements would enhance safety and reliability, but are not required for effective use of these patterns today.
Tracking: OpenClaw framework issue (conceptual - not filed yet) Credit: Agent Smith (EasyClaw peer review, 2026-02-22)
Scenario: Integrate with untrusted social network API
// Main agent receives request
const task = "Fetch user's posts from SocialNetworkX API"
// Security proxy pattern
const proxy = await spawnSubagent({
label: "proxy-socialnetworkx",
pattern: "security-proxy",
task: "Query SocialNetworkX API for user posts, return sanitized JSON",
context: {
query: "user:$USERNAME, limit:10"
// NO API keys, NO workspace paths
},
tools_allowed: ["exec:curl"],
timeout_minutes: 5,
auto_terminate: true
})
// proxy returns: [{username, timestamp, content}] - sanitized
// Main agent: Safe to process, no raw API exposure
Scenario: Evaluate whether to adopt new framework
// Spawn 3 researchers with different biases
const researchers = await Promise.all([
spawnSubagent({
label: "researcher-optimist",
pattern: "researcher-specialist",
task: "Research FrameworkX benefits, best-case adoption scenario",
personality: "Optimistic, highlights opportunities"
}),
spawnSubagent({
label: "researcher-pessimist",
pattern: "researcher-specialist",
task: "Research FrameworkX risks, failure modes, known issues",
personality: "Skeptical, risk-focused"
}),
spawnSubagent({
label: "researcher-pragmatist",
pattern: "researcher-specialist",
task: "Research FrameworkX current state, real-world adoption data",
personality: "Pragmatic, data-driven"
})
])
// Synthesize: Balanced view with decision criteria
const decision = synthesizeResearch(researchers.map(r => r.findings))
Scenario: Build complex new skill
// Phase 1: Architecture
const plan = await spawnSubagent({
label: "architect-skill-x",
pattern: "phased-implementation",
phase: "architect",
task: "Design implementation plan for skill-x with [requirements]",
model: "sonnet"
})
// Review plan, get approval if expensive
if (plan.cost_estimate > 2.00) {
await humanApproval(plan)
}
// Phase 2: Implementation
const implementation = await spawnSubagent({
label: "coder-skill-x",
pattern: "phased-implementation",
phase: "coder",
task: `Implement skill-x per plan: ${plan.path}`,
model: plan.recommended_model
})
// Phase 3: Review (conditional on risk)
if (implementation.risk_level === "high") {
const review = await spawnSubagent({
label: "reviewer-skill-x",
pattern: "phased-implementation",
phase: "reviewer",
task: `Review implementation: ${implementation.path}`,
model: "sonnet"
})
if (!review.approved) {
// Iterate or escalate
}
}
Scenario: Security audit for payment processing
// Prepare sanitized review package
const package = sanitizeForReview({
artifact: readFile("src/payment-processor.js"),
remove: ["API_KEYS", "USER_DATA", "INTERNAL_URLS"]
})
// Request peer review via Discord
await message({
action: "send",
target: "smith-security-bot",
message: `Security review request:\n${JSON.stringify(package)}\n\nFocus: Payment processing, SQL injection, input validation`
})
// Wait for peer response (auto-announces when received)
// Process feedback, fix critical issues, optionally re-submit
Diagnosis:
Solutions:
Diagnosis:
Solutions:
Diagnosis:
Solutions:
Diagnosis:
Solutions:
Known gaps identified in peer review (Agent Smith, EasyClaw - 2026-02-22)
These limitations represent areas where the current OpenClaw subagent framework needs evolution. Documented here for transparency and future development.
Current state:
Missing:
v2 Design Consideration:
sessions_spawn({
task: "...",
constraints: {
maxMemoryMB: 512,
maxAPICalls: 100,
maxDiskMB: 50,
tools: {
allowed: ['read', 'write'],
forbidden: ['exec', 'message'],
restrictions: {
read: { paths: ['/workspace/restricted/'] },
write: { paths: ['/workspace/output/'], maxFileSize: '10MB' }
}
}
}
})
Impact: Without per-spawn constraints, one rogue subagent can exhaust resources for all others.
Current state:
skill-vetter (red flag detection)Missing:
v2 Design Consideration:
// Sandbox wrapper for skill execution
runSkill('ai-writing-humanizer', {
sandbox: {
networkAccess: false, // Block network except allowlisted domains
filesystemAccess: 'readonly', // Read-only except explicit write paths
cpuLimit: '50%', // Prevent CPU monopolization
timeout: 30000 // 30s max execution
},
validate: {
outputSchema: { ... }, // Verify output format
sideEffects: ['writes to /workspace/humanizer/'], // Allowed side effects
noExternalCalls: true // Flag if skill attempts network/exec
}
})
Impact: Without runtime sandboxing, malicious or buggy skills can compromise core agent.
Current state:
Missing:
v2 Design Consideration:
// Subagent perspective
async function executeTask(taskDescription) {
const plan = generatePlan(taskDescription);
if (plan.hasHighRiskOperation()) {
const approval = await askCore({
question: "Plan includes deleting 100 files. Proceed?",
options: ["yes", "no", "show list first"],
timeout: 60000 // 1 min to respond
});
if (approval !== "yes") return { status: "aborted", reason: "user declined" };
}
// Continue with approved plan...
}
Impact: Without bidirectional channels, subagents must either be overly cautious (ask for everything upfront) or overly bold (proceed with assumptions).
Current state:
Missing:
v2 Design Consideration:
sessions_spawn({
task: "...",
limits: {
timeout: 600000, // 10 min max
maxMemory: 512, // MB
maxCost: 2.00, // USD
maxOutputSize: 10, // MB
maxToolCalls: 100, // Total tool invocations
stuckDetection: {
enabled: true,
sameToolRepeat: 5, // Flag if same tool called 5+ times consecutively
noProgressTimeout: 120000 // Flag if no output/tool calls for 2 min
}
},
onLimitExceeded: (limit, value) => {
logFailure({ subagent, limit, value });
return { action: 'terminate', notify: 'core' };
}
})
Impact: Without resource-based kills, subagents can waste budget, exhaust memory, or spin indefinitely.
Update (v2.1 → v2.2): A lightweight v0 of post-mortem logging is now implementable without framework changes. The crash report schema defined in templates/proxy-recovery.md gives you structured failure data per-spawn. Core reads crash reports and routes them. This covers the "what failed and why" dimension immediately.
What still requires framework work (see v2 roadmap below): automated outcome analytics, pattern detection across spawns, success rate tracking. But single-spawn forensics — you can ship that now.
Current state:
proxy-recovery.md pattern get crash reports per spawnMissing:
v2 Design Consideration:
// Automatic post-mortem on subagent completion/failure
function logSubagentOutcome(subagent, result) {
const record = {
timestamp: Date.now(),
label: subagent.label,
task: subagent.task,
model: subagent.model,
status: result.status, // 'done', 'timeout', 'error', 'killed'
runtime: result.runtime,
cost: result.totalCost,
tokensUsed: result.totalTokens,
outcome: result.status === 'done' ? 'success' : 'failure',
failureReason: result.error || result.killReason || null,
valueProvided: result.status === 'done' ? assessValue(result) : null
};
appendToLog('memory/subagent-outcomes.jsonl', record);
updateSuccessRates(subagent.label, record.outcome);
if (record.outcome === 'failure') {
analyzeFailurePattern(record); // Flag common failure modes
}
}
// Quarterly review: Which subagent types are most cost-effective?
function analyzeSubagentROI() {
const outcomes = readLog('memory/subagent-outcomes.jsonl');
const byType = groupBy(outcomes, o => extractType(o.label)); // CoderAgent, ResearchAgent, etc.
for (const [type, records] of Object.entries(byType)) {
const successRate = records.filter(r => r.outcome === 'success').length / records.length;
const avgCost = mean(records.map(r => r.cost));
const avgValue = mean(records.filter(r => r.valueProvided).map(r => r.valueProvided));
console.log(`${type}: ${(successRate*100).toFixed(1)}% success, $${avgCost.toFixed(2)} avg cost, ${avgValue.toFixed(1)} avg value`);
}
}
Impact: Without post-mortem analysis, you repeat failures, can't optimize spawn strategy, and waste budget on low-value subagents.
Conclusion:
These 5 limitations don't block current usage but represent friction points as subagent usage scales. Prioritize based on pain:
Feedback welcome: If you've hit these limitations in production, share workarounds and pain points via OpenClaw community channels.
This skill improves through real-world usage. Please contribute:
Pattern refinements:
New patterns:
Track record:
MIT - Use freely, modify as needed, share improvements
templates/proxy-recovery.md — crash/recovery pattern (pre-death checklist, crash report schema, core response decision tree, quarantine mode, human notification, re-spawn decision guide)templates/security-proxy.md — full rewrite integrating: 6-stage inbound validation pipeline, Stage 6b outbound semantic leak filter, self-imposed rate limiter + circuit breaker, crash/recovery checklist, identity continuity section, MoltbookProxy as real-world reference implementationproxy-recovery.md are a working v0 of per-spawn forensics (no framework changes required)templates/ directory