Install
openclaw skills install responsive-agentEnables a two-layer agent pattern where the main process never blocks by spawning subagents for long, remote, or uncertain tasks, ensuring constant responsiv...
openclaw skills install responsive-agentOrigin: Based on session-coordinator v3, merged with async-command patterns.
Is it fast (<1s) AND local AND same topic AND no network I/O AND read-only?
→ YES: exec directly in main process, respond
→ NO: spawn subagent immediately
| Command duration | yieldMs | exec timeout |
|---|---|---|
| < 5s, need sync result | none (exec directly) | 10 |
| 5–30s | 5000 | 60 |
| > 30s, async ok | 10000 | 300 |
| Unknown duration | 30000 | 600 |
On exec failure: log → write workspace/tasks/{task-id}/error.md → reply to parent. No retry. Fail-fast.
┌─────────────────────────────────────────────────────────┐
│ Layer 1: MAIN PROCESS (never blocks) │
│ dialog + dispatch only — spawn everything else │
└─────────────────────────────────────────────────────────┘
│
spawn ↓
┌─────────────────────────────────────────────────────────┐
│ Layer 2: SUBAGENT (executes tasks) │
│ exec + yieldMs for long local commands │
└─────────────────────────────────────────────────────────┘
Rule: Main process never blocks. Subagent never hangs.
Main process = dialog + dispatch. Never block.
The main goal is responsiveness — the main process must never be blocked, stalled, or made unresponsive. This is not "never execute." It is "never block."
A blocked main process shows as: typing indicator disappears, user cannot get a response, session appears frozen.
Max concurrent subagents per main process: 5
Detailed priority queue implementation:
{label, task, priority: "high"|"normal", timestamp}Concurrent high-priority conflict: When two high-priority tasks arrive simultaneously:
High-priority flag: If user says "紧急" or "priority":
Use this to decide whether to exec directly or spawn:
| Duration | Type | Action |
|---|---|---|
| < 3s, known fast | local read, simple computation | exec directly in main process |
| 3–10s | local with uncertain load | exec with yieldMs=3000, timeout=15 |
| > 10s or unknown | any remote, any build, any unknown | spawn subagent |
Rule: "When in doubt, spawn."
Specifically:
Quick decision tree:
Is it < 3s AND local AND known-fast AND read-only?
→ YES: exec directly
→ NO: spawn subagent
When a request arrives at the main process, answer these questions in order:
Q1: Is it a FAST local read (<1s)?
- No read I/O, no network, same topic, read-only?
→ YES: exec directly in main process, respond now
→ NO: proceed to Q2
Q2: Is it ANY of the following?
- Remote / network / SSH / HTTP
- git push / pull / clone
- Unknown duration
- Different topic from current conversation
- State-modifying (write, commit, deploy, publish)
→ YES: spawn subagent immediately
→ NO: still spawn (conservative — when in doubt, spawn)
Binary rule: if you hesitated on any Q, spawn.
When user says "接着上面的" or "continue from above":
Implementation method:
exec(command="cat workspace/tasks/{task-id}/task.md", timeout=5)
memory_search on the task-idResult: New subagent can continue from where it left off.
workspace/tasks/{task-id}/task.md is the continuity mechanismAlways spawn (never exec in main process):
Main process never executes long commands. It only spawns.
Inside a subagent, use exec + yieldMs for commands that may take time:
// Short local command (<1s): exec directly, no yield needed
exec(command="ls /tmp")
// Long local command (5-30s): exec + yieldMs
exec(command="some-long-operation.sh", yieldMs=5000, timeout=60)
// Long command (>30s, async ok): exec + yieldMs
exec(command="moderate-build.sh", yieldMs=10000, timeout=300)
// Unknown duration: exec + yieldMs + large timeout
exec(command="unknown-task.sh", yieldMs=30000, timeout=600)
// Very long command: exec + background + poll
exec(command="very-long-build.sh", background=true, yieldMs=60000, timeout=3600)
yieldMs = how long to wait before backgrounding. If the command finishes within yieldMs, result is returned directly. If not, it is backgrounded and result arrives via push.
If you need the result, yieldMs must be LARGER than estimated command duration.
| Scenario | Problem | Correct |
|---|---|---|
| yieldMs=5000, command takes 8s | Command bg'd BEFORE completing, result may be lost | Set yieldMs >= 8000 |
| yieldMs=10000, command takes 45s | Command completes normally, result returned | ✅ correct |
| yieldMs=5000, command takes 4s | Command completes normally, result returned | ✅ correct |
Rule:
These are two different mechanisms for handling long commands. They are not the same.
yieldMs | background=true | |
|---|---|---|
| What it does | Waits N ms before backgrounding, but process is still attached — result will eventually come back via push | Truly detaches the process — runs independently, no result returned unless you poll |
| Result delivery | ✅ Result arrives via push when done | ❌ No automatic result — detached |
| Use when | You need the result eventually | You explicitly need NO result (pure fire-and-forget) |
If you set both background=true AND yieldMs:
yieldMs takes precedence initially — the command waits up to yieldMs milliseconds before being backgroundedbackground=true kicks in — the process is truly detached| Scenario | Use |
|---|---|
| Need the result | yieldMs (set >= estimated duration) |
| Explicitly need NO result (pure fire-and-forget) | background=true |
| Most cases | yieldMs (not background=true) |
// ✅ Need result — use yieldMs
// Inside subagent:
exec(command="analysis-script.sh", yieldMs=30000, timeout=120)
// ✅ Explicitly need NO result — use background=true
// Inside subagent:
exec(command="ping -c 100 remote-host", background=true)
// Result: none, process runs independently
Rule of thumb: Default to yieldMs. Use background=true only when you explicitly need the process to be truly detached with no result expected.
yieldMs: wait time before backgrounding. If command takes longer than yieldMs, it goes to background automatically.timeout: hard kill after N seconds. Command is killed if it exceeds this.yieldMs < timeout always. yieldMs is the backgrounding threshold; timeout is the hard limit.| Command duration | yieldMs | timeout | Strategy |
|---|---|---|---|
| < 5s, sync result needed | none (exec directly) | 10 | No yieldMs needed — exec direct |
| 5–30s | 5000 | 60 | yieldMs = wait-before-bg, timeout = hard limit |
| > 30s, async ok | 10000 | 300 | yieldMs small → bg quickly, timeout allows completion |
| Unknown duration | 30000 | 600 | Conservative — bg after 30s, kill at 10min |
| Very long (>10min) | 10000 | 3600 | yieldMs=10s for responsiveness, long timeout for completion |
Want result synchronously (wait in subagent until done)?
yieldMs = timeout (or omit yieldMs, use timeout only)exec("short-task.sh", timeout=30) — waits up to 30s, returns resultAsync is fine (background immediately)?
yieldMs small (e.g., 5000–10000ms)exec("long-task.sh", yieldMs=5000, timeout=300) — bg after 5s, result via pushWhen an exec call fails inside a subagent:
mkdir -p workspace/tasks/{task-id} (subagent's responsibility)workspace/tasks/{task-id}/error.mdError file format (workspace/tasks/{task-id}/error.md):
# Subagent Error Report
- Task: [task description]
- Timestamp: [ISO timestamp]
- Exit code: [N or "signal"]
- Command: [what was run]
## stdout
[output]
## stderr
[error output]
## Analysis
[brief root cause if known]
Key principles:
error.md cleanup strategy:
workspace/tasks/*/error.mdworkspace/tasks/archive/{year}/{task-id}/error.mdDirectory creation: Before writing any files (error.md or temp results), the subagent must create the directory:
mkdir -p workspace/tasks/{task-id} # for error files
mkdir -p workspace/temp/{task-id} # for large results
This is the subagent's responsibility, not the main process's.
When a subagent completes (success or failure), it must reply to its parent session — it does not use sessions_yield. The subagent session key is known at spawn time; use it to send the final result directly.
Result format: brief summary of what was done, file paths if any (not raw large data).
// Subagent sends result back to parent via message tool (channel send)
message(action="send", target="{parent_session_key}", message="Task complete. Files written: workspace/temp/{task-id}/result.txt")
// Subagent then terminates normally — no sessions_yield needed
When to reply:
Key point: Subagent does NOT use sessions_yield — it terminates normally after sending the result via sessions_send.
If spawning a subagent fails (system error):
mkdir -p workspace/mailbox/errors/ then write to workspace/mailbox/errors/{timestamp}-spawn-fail.mdError log format:
# Spawn Failure Report
- Timestamp: [ISO timestamp]
- Task: [task description]
- Reason: [system error message if known]
Default spawn mode is mode="session" with thread=true.
| Mode | thread | Behavior | Use When |
|---|---|---|---|
mode="session" | thread=true | Persistent conversation — context is preserved across messages, multi-turn dialog possible | Multi-step tasks, tasks where user may ask follow-up questions, uncertain scope |
mode="session" | thread=false | New session each time — context is NOT preserved, each spawn starts fresh. Subagent sends ONE final result push when done, then the session is closed. Cannot receive additional messages in the same thread. | Fire-and-forget tasks with bounded scope, no follow-up expected |
mode="run" | N/A | Detached, no session — truly fire-and-forget, no result push | Rare; avoid unless you explicitly need detached execution |
mode="one-shot" | N/A | Does not exist in OpenClaw. Was considered but removed. | Use mode="session" + thread=false for bounded fire-and-forget. |
Both thread=true and thread=false receive the result via push mechanism. The difference is whether the session stays open afterward:
thread=false: session created but closed after final result push — one-shot, no follow-upthread=true: session stays alive, can receive multiple messages across turns"Is this a multi-step task or does the user expect follow-up?"
mode="session" + thread=truemode="session" + thread=false (fire-and-forget)mode="run"// ✅ Multi-step task — context preserved, user may ask follow-up
sessions_spawn(task="debug service issue, may need to run multiple commands",
runtime="medium", label="service-debug")
// ✅ Fire-and-forget — bounded scope, no follow-up expected
sessions_spawn(task="restart the backup service on remote node",
runtime="fast", label="backup-restart")
// ⚠️ Run mode — only when you explicitly need detached, no result push
sessions_spawn(task="background health check", runtime="fast", label="health-bg", mode="run")
Note: thread=false still creates a session (unlike mode="run"), so you get result push — but the subagent starts with a fresh context, no history from previous related spawns.
mode="session", thread=true for most tasks (interactive, multi-step, or uncertain duration)mode="session", thread=false for fire-and-forget tasks with bounded scopemode="run" unless you explicitly need detached/no-result (it provides no result push)// ✅ Default: session thread (most tasks)
sessions_spawn(task="task-description", runtime="medium", label="task-label")
// ✅ Fire-and-forget: bounded scope, fresh context each time
sessions_spawn(task="task-description", runtime="fast", label="task-label")
// ⚠️ Avoid: run mode provides no result push — use only when detached is explicitly needed
sessions_spawn(task="task-description", runtime="fast", label="task-label", mode="run")
Use these as a guide to set runtime when spawning:
| Tier | Duration | Examples |
|---|---|---|
| fast | ~5 min | file ops, single API call, config check, quick test |
| medium | ~15 min | git operations, moderate build, moderate data processing |
| long | 60 min+ | large clones, heavy builds, multi-step deployments, data migration |
// Fast task — 5 min timeout
sessions_spawn(task="check service status", runtime="fast", label="service-status-check")
// Medium task — 15 min timeout
sessions_spawn(task="clone repository and run tests", runtime="medium", label="repo-test")
// Long task — 60 min timeout
sessions_spawn(task="full deployment pipeline", runtime="long", label="deploy-pipeline")
When spawning a subagent, always include a descriptive label parameter:
project-config-sync not project config sync)GOOD labels (functional, generic):
label="config-sync" # what it does, not which project
label="log-review" # functional category
label="deploy-check" # what it checks, not which environment
label="service-health" # generic health check
label="backup-status" # generic backup check
label="build-verify" # what it verifies
BAD labels (specific identifiers, reveal private data):
label="192.168.1.1" # ❌ specific IP — privacy leak
label="ken-task" # ❌ specific user name — privacy leak
label="fix-123" # ❌ specific bug number — reveals internal tracking
label="proj-zeta" # ❌ specific project codename
label="john-doe-deploy" # ❌ specific person + action
label="acme-corp-sync" # ❌ specific company name
Generic placeholder rewrites:
192.168.x.x → remote-host or node-alphauser@host → admin-user or service-accountproject-name → repo-name or project-alphabug-123 → bugfix-verify or issue-check// ✅ Good: generic, dashes, descriptive
sessions_spawn(task="sync config to remote node", label="config-sync")
sessions_spawn(task="run test suite for module", label="test-suite")
sessions_spawn(task="deploy service to environment", label="service-deploy")
// ❌ Bad: spaces, specific names, cryptic
sessions_spawn(task="...", label="proj1") // too short, no dashes
sessions_spawn(task="...", label="192-168-1-1") // specific IP — not allowed
sessions_spawn(task="...", label="johns-task") // specific user name — not allowed
sessions_spawn(task="...", label="foo-bar-v2") // specific project codename — not allowed
When the user says "stop that thing", look at subagents list and match the label to find the right subagent to stop.
When a task requires multiple independent subagents running simultaneously:
Data sharing: each subagent writes its result to its own subdirectory workspace/tasks/{parent-task-id}/{subagent-label}/result.txt
Result gathering: results are collected after ALL subagents complete (not streaming). Main process reads all result files and synthesizes.
This is fan-out pattern, not DAG dependency pattern — all subagents are independent and start at the same time.
Main process needs a way to know when ALL parallel subagents are done:
Mechanism: counter-based tracking
remaining = N (number of parallel subagents)remaining == 0: all done, synthesize and report to userImplementation:
Example:
parallel subtasks: [agent-a, agent-b, agent-c]
remaining = 3
→ agent-a pushes result: remaining = 2
→ agent-c pushes result: remaining = 1
→ agent-b pushes result: remaining = 0 → all done, synthesize
⚠️ Warning: multiple subagents writing to the same parent directory is safe IF they write to DIFFERENT subdirectories (each subagent has its own {subagent-id}/).
Rule: each subagent writes ONLY to its own workspace/tasks/{parent-task-id}/{subagent-label}/
Use a common prefix for parallel subagent labels so they can be cancelled together:
// Good: prefix-based naming for parallel tasks
sessions_spawn(task="check weather for location", runtime="fast", label="parallel-check-1")
sessions_spawn(task="check email inbox for urgent messages", runtime="fast", label="parallel-check-2")
sessions_spawn(task="check calendar for upcoming events", runtime="fast", label="parallel-check-3")
// Example: user asks "check weather + email + calendar"
// Main process spawns 3 subagents in parallel:
sessions_spawn(task="check weather for location", runtime="fast", label="parallel-check-1")
sessions_spawn(task="check email inbox for urgent messages", runtime="fast", label="parallel-check-2")
sessions_spawn(task="check calendar for upcoming events", runtime="fast", label="parallel-check-3")
// Each subagent writes its result to:
// workspace/temp/{task-id}/parallel-check-1.txt
// workspace/temp/{task-id}/parallel-check-2.txt
// workspace/temp/{task-id}/parallel-check-3.txt
// Main process waits for all 3 results, then synthesizes and replies
Rule: partial results are better than no results — never waste succeeded work.
If one or more parallel subagents fail:
workspace/tasks/{task-id}/error.md and replies to parentExample:
User: check weather + email + calendar
→ 3 subagents spawned (parallel-check-1, parallel-check-2, parallel-check-3)
→ parallel-check-1 (weather): succeeded → weather-check.txt
→ parallel-check-2 (email): failed → error.md written, reply sent
→ parallel-check-3 (calendar): succeeded → calendar-check.txt
Main process synthesizes: 1 failed, 2 succeeded.
Reply: "2/3 succeeded. Weather: sunny, 22°C. Calendar: meeting at 3pm. Email check failed — see error log."
If subagent produces > 1MB of data, write to file instead of returning raw:
workspace/temp/{task-id}/result.txtExample:
// Large output scenario
exec(command="git diff --name-only", yieldMs=10000, timeout=60)
// Result: 50MB of file names → write to file instead
// Correct approach:
// 1. subagent writes to workspace/temp/{task-id}/result.txt
// 2. subagent replies with file path
// 3. main process reads path, reports to user
Cleanup: temp files are ephemeral — OS handles cleanup on reboot. No manual cleanup needed.
Normal duration: seconds to minutes depending on task complexity.
| Task Type | Expected Duration | Action |
|---|---|---|
| Fast local reads | < 1s | exec direct in main |
| Simple operations | seconds | subagent |
| Multi-step tasks | minutes | subagent |
| Build/clone/deploy | minutes to hours | subagent with appropriate timeout |
subagents listruntime timeout (fast/medium/long)A subagent can be safely let timeout (or can terminate early) when:
After subagent completes: if the task resulted in a decision, config change, or state change, update MEMORY.md.
| Result Type | Update MEMORY.md? | Example |
|---|---|---|
| Trivial/ephemeral | ❌ No | "checked service status", "file exists" |
| Decision made | ✅ Yes | "decided to use strategy X over Y because..." |
| Config changed | ✅ Yes | "updated timeout from 30s to 60s" |
| State change | ✅ Yes | "moved from manual to automated backup" |
| Lesson learned | ✅ Yes | "found that approach A fails in scenario X" |
| Important output | ✅ Yes | "discovered key info about..." |
## [Date] Task Result
- Task: [brief description]
- Outcome: [what happened]
- Key decision/learning: [why this matters]
Main process handles MEMORY.md writes — subagent reports the result, main process decides whether to write it down.
Routine subagent work should be logged in the daily memory file:
# Create/update daily log
memory/2026-04-06.md
Include: task label, what was done, key result. Keep it concise.
Do not announce to user when you are spawning a subagent.
User does not need to know the execution architecture. The subagent handles the work; you handle the conversation.
| Responsibility | Main Process | Subagent |
|---|---|---|
| User dialog | ✅ | ❌ |
| Task dispatch | ✅ | ❌ |
| Result production | ❌ | ✅ |
| Result delivery to user | ✅ | ❌ |
| Decision making | ✅ | ❌ |
| Status coordination | ✅ | ❌ |
| Fast local operations (<1s) | ✅ | ❌ |
| Blocking / remote / long operations | ❌ | ✅ |
| exec + yieldMs for long local commands | ❌ | ✅ |
If the main process blocks, the pattern has failed — regardless of whether the task eventually completes.
// Layer 1: Main process — always spawn for non-trivial work
// ❌ Wrong — blocks main process, typing disappears
exec(command="ssh remote-host ...")
// ✅ Correct — non-blocking, typing stays active
sessions_spawn(task="remote operation description", runtime="medium", label="remote-task")
// Inside the spawned subagent (Layer 2):
// ✅ exec + yieldMs for long local command
exec(command="some-long-local-script.sh", yieldMs=10000, timeout=300)
// ✅ exec + background for very long commands
exec(command="very-long-build.sh", background=true, yieldMs=60000, timeout=3600)
After spawning: continue dialog with user. Do NOT wait. Results arrive via push event.
Each node runs its own subagents. Nodes do not delegate to each other's subagents. Coordination between nodes goes through whatever inter-node protocol is configured for that setup.
When the user says "停" / "stop" / "取消":
subagents list — find all active subagentsparallel-check-1, parallel-check-2, parallel-check-3), match by prefix to catch ALL of them{label}" (for single)# Subagent Cancellation Log
- Label: {label}
- Cancelled by: user
- Timestamp: {ISO timestamp}
- Result: SIGTERM → SIGKILL (or SIGTERM only if clean exit)
Example flow (single subagent):
User: 停!
Agent: (subagents list) → finds label="config-sync"
(kill config-sync subagent)
"已停止 config-sync"
Example flow (parallel subagents):
User: 停!
Agent: (subagents list) → finds parallel-check-1, parallel-check-2, parallel-check-3
(kill all 3 subagents with prefix "parallel-check")
"已停止全部 3 个并行子任务"
Do NOT return partial results when user cancels — assume clean stop is wanted.
Rule: Always reply confirming which subagent was stopped. Never silently ignore a cancellation request.
| Signal | Meaning | Behavior |
|---|---|---|
| SIGTERM | Graceful termination | Finish current step, then exit (max 5 seconds) |
| SIGKILL | Forced termination | Exit immediately, no cleanup |
When to use SIGKILL: Only when user explicitly says "强制停止" or "kill" (kill, not stop/cancel).
When ClawHub returns 429 Too Many Requests:
sessions_spawn(
task="Install or publish skill to ClawHub, handling rate limits. If 429 received, wait Retry-After seconds then retry. Max 3 attempts.",
label="clawhub-ops"
)
Inside the subagent:
When GitHub API returns 403 with "rate limit exceeded":
sessions_spawn(
task="GitHub API operation, handle rate limits. If 403 received, wait suggested reset time then retry. Max 3 attempts.",
label="github-ops"
)
Inside the subagent:
| Anti-Pattern | Why Wrong | Correct |
|---|---|---|
| exec long command in main process | Blocks typing | Main process spawns; subagent uses exec+yieldMs |
| "Let me check..." + exec in main | Blocks typing | Spawn subagent |
| Waiting for subagent result before responding | Blocks dialog | Continue, report later |
| Direct remote operations in main process | Blocks | Spawn subagent |
| Spawning to avoid doing work (not to avoid blocking) | Misuse of pattern | Fast ops = direct in main process |
| All work spawn, nothing direct | Over-interpretation | Fast local ops = direct exec in main |
| Announcing the spawn to user | Noise, breaks flow | Spawn silently, continue |
| Subagent uses blocking exec without yieldMs | Subagent hangs | Subagent uses exec+yieldMs |
| Using specific IPs/hostnames in labels | Privacy leak | Use generic names |
| yieldMs >= timeout | Background threshold >= hard kill — illogical | yieldMs must always be < timeout |
| Subagent silently failing without reporting | Parent never knows | Always reply to parent, write error.md |
| Using thread=false when multi-step needed | Loses context mid-task, user follow-up fails | Use thread=true for multi-step or uncertain scope |
| Using thread=true for fire-and-forget | Wastes context window on single-use task | Use thread=false for bounded fire-and-forget tasks |
mode="one-shot" (deprecated) | Not a valid mode — anti-pattern residue | Use mode="run" for detached or mode="session" with thread=false for fire-and-forget |
| User says "stop" but nothing happens | Subagent keeps running | Implement cancellation protocol: list → match label → SIGTERM → SIGKILL → reply |
| Subagent cancelled but no reply to user | User doesn't know it stopped | Always reply confirming which subagent was stopped |
User request arrives
│
▼
┌───────────────────────────────────┐
│ LAYER 1: MAIN PROCESS │
│ Q1: fast + local + same topic? │
│ YES → exec direct, respond │
│ Q2: any spawn condition met? │
│ YES → spawn subagent, continue │
└───────────────────────────────────┘
│
spawn ↓
┌───────────────────────────────────┐
│ LAYER 2: SUBAGENT │
│ exec + yieldMs for long commands │
│ background for very long │
│ On error: log + error.md + reply │
│ Return result via push │
└───────────────────────────────────┘
│
▼
Result arrives at main process
Report to user
If significant → write to MEMORY.md
"Never block" is the rule. "Never execute" is a misreading of the rule — fast operations should be direct in the main process because they do not block.
The two-layer architecture makes this clean:
The typing indicator is the visible signal: if it disappears, something is blocking.
All examples in this skill use generic placeholders only. No specific project names, IPs, hostnames, user names, service names, or AI names appear in any example. This protects both operational security and privacy.