{"skill":{"slug":"resilient-coding-agent","displayName":"Resilient Coding Agent","summary":"Run long-running coding agents (Codex, Claude Code, etc.) in tmux sessions that survive orchestrator restarts, with automatic resume on interruption.","description":"---\nname: resilient-coding-agent\ndescription: \"Run long-running coding agents (Codex, Claude Code, etc.) in tmux sessions that survive orchestrator restarts, with automatic resume on interruption.\"\nmetadata:\n  openclaw:\n    emoji: \"🛡️\"\n    requires:\n      bins: [tmux]\n      anyBins: [codex, claude, opencode, pi]\n---\n\n# Resilient Coding Agent\n\nLong-running coding agent tasks (Codex CLI, Claude Code, OpenCode, Pi) are vulnerable to interruption: orchestrator restarts, process crashes, network drops. This skill decouples the coding agent process from the orchestrator using tmux, and leverages agent-native session resume for recovery.\n\n**Placeholders:** `<task-name>` and `<project-dir>` are filled in by the orchestrator. `<task-name>` must match `[a-z0-9-]` only. `<project-dir>` must be a valid existing directory.\n\n**Temp directory:** Each task uses a secure temp directory created with `mktemp -d`. Store this path as `<tmpdir>` and use it for all task files (prompt, events, session ID, done marker). This avoids predictable filenames and symlink/race conditions. Example: `TMPDIR=$(mktemp -d)` produces something like `/var/folders/xx/.../T/tmp.aBcDeFgH`.\n\n**Prompt safety:** Task prompts are never interpolated into shell commands. Instead, write the prompt to a temp file using the orchestrator's `write` tool (no shell involved), then reference it with `\"$(cat $TMPDIR/prompt)\"` inside the tmux command. The shell treats command substitution output inside double quotes as a single literal argument, preventing injection. This depends on the orchestrator's `write` tool not invoking a shell; OpenClaw's built-in `write` tool meets this requirement.\n\n**Sensitive output:** tmux scrollback and event log files may contain secrets or API keys from agent output. On shared machines, restrict file permissions (`chmod 600`) and clean up temp directories after task completion.\n\n## Prerequisites\n\nThis skill assumes the orchestrator is already configured to use coding agent CLIs (Codex, Claude Code, etc.) for coding tasks instead of native sessions. If the orchestrator is still using `sessions_spawn` for coding work, configure it to prefer coding agents first (e.g., via AGENTS.md or equivalent). See the `coding-agent` skill for setup.\n\n## When to Use This\n\nUse this pattern when:\n- The task is expected to take **more than 5 minutes**\n- The orchestrator might restart during execution\n- You want fire-and-forget execution with completion notification\n\nFor quick tasks under 5 minutes, running the agent directly is fine.\n\n## Start a Task\n\nCreate a tmux session with a descriptive name. Use the agent prefix (`codex-`, `claude-`, etc.) for easy identification.\n\n### Codex CLI\n\n```bash\n# Step 1: Create secure temp directory\nTMPDIR=$(mktemp -d)\nchmod 700 \"$TMPDIR\"\n\n# Step 2: Write prompt to file (use orchestrator's write tool, not echo/shell)\n# File: $TMPDIR/prompt\n\n# Step 3: Launch in tmux (pass TMPDIR via env)\ntmux new-session -d -s codex-<task-name> -e \"TASK_TMPDIR=$TMPDIR\"\ntmux send-keys -t codex-<task-name> 'cd <project-dir> && set -o pipefail && codex exec --full-auto --json \"$(cat $TASK_TMPDIR/prompt)\" | tee $TASK_TMPDIR/events.jsonl && echo \"__TASK_DONE__\"' Enter\n\n# Step 4: Capture this task's Codex session ID; resume --last is unsafe with concurrent tasks.\n# Uses jq for reliable JSON parsing (falls back to grep if jq unavailable).\nuntil [ -s \"$TMPDIR/codex-session-id\" ]; do\n  if command -v jq &>/dev/null; then\n    jq -r 'select(.thread_id) | .thread_id' \"$TMPDIR/events.jsonl\" 2>/dev/null | head -n 1 > \"$TMPDIR/codex-session-id\"\n  else\n    grep -oE '\"thread_id\":\"[^\"]+\"' \"$TMPDIR/events.jsonl\" 2>/dev/null | head -n 1 | cut -d'\"' -f4 > \"$TMPDIR/codex-session-id\"\n  fi\n  sleep 1\ndone\n```\n\n### Claude Code\n\n```bash\n# Create secure temp directory and write prompt to $TMPDIR/prompt first\nTMPDIR=$(mktemp -d) && chmod 700 \"$TMPDIR\"\ntmux new-session -d -s claude-<task-name> -e \"TASK_TMPDIR=$TMPDIR\"\ntmux send-keys -t claude-<task-name> 'cd <project-dir> && claude -p \"$(cat $TASK_TMPDIR/prompt)\" && echo \"__TASK_DONE__\"' Enter\n```\n\n### OpenCode / Pi\n\n```bash\n# Create secure temp directory and write prompt to $TMPDIR/prompt first\nTMPDIR=$(mktemp -d) && chmod 700 \"$TMPDIR\"\n\n# OpenCode\ntmux new-session -d -s opencode-<task-name> -e \"TASK_TMPDIR=$TMPDIR\"\ntmux send-keys -t opencode-<task-name> 'cd <project-dir> && opencode run \"$(cat $TASK_TMPDIR/prompt)\" && echo \"__TASK_DONE__\"' Enter\n\n# Pi (separate temp dir)\nTMPDIR=$(mktemp -d) && chmod 700 \"$TMPDIR\"\ntmux new-session -d -s pi-<task-name> -e \"TASK_TMPDIR=$TMPDIR\"\ntmux send-keys -t pi-<task-name> 'cd <project-dir> && pi -p \"$(cat $TASK_TMPDIR/prompt)\" && echo \"__TASK_DONE__\"' Enter\n```\n\n### Completion Notification (Optional)\n\nChain a notification command after the agent so you know when it finishes. Use `;` before `echo \"__TASK_DONE__\"` so the marker prints even if the notification command fails:\n\n```bash\n# Generic: touch a marker file\ntmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto \"$(cat $TASK_TMPDIR/prompt)\" && touch $TASK_TMPDIR/done; echo \"__TASK_DONE__\"' Enter\n\n# macOS: system notification\ntmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto \"$(cat $TASK_TMPDIR/prompt)\" && osascript -e \"display notification \\\"Task done\\\" with title \\\"Codex\\\"\"; echo \"__TASK_DONE__\"' Enter\n\n# OpenClaw: system event (immediate wake)\ntmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto \"$(cat $TASK_TMPDIR/prompt)\" && openclaw system event --text \"Codex done: <task-name>\" --mode now; echo \"__TASK_DONE__\"' Enter\n```\n\n## Monitor Progress\n\n```bash\n# Check if the session is still running\ntmux has-session -t codex-<task-name> 2>/dev/null && echo \"running\" || echo \"finished/gone\"\n\n# Read recent output (last 200 lines)\ntmux capture-pane -t codex-<task-name> -p -S -200\n\n# Read the full scrollback\ntmux capture-pane -t codex-<task-name> -p -S -\n```\n\nCheck progress when:\n- The user asks for a status update\n- You want to proactively report milestones\n\n## Health Monitoring\n\nFor long-running tasks, use an active monitor loop instead of only checking on demand.\n\nPeriodic check flow:\n1. Run `tmux has-session -t <agent-task>` to confirm the tmux session still exists.\n2. Run `tmux capture-pane -t <agent-task> -p -S -<N>` to capture recent output.\n3. Detect likely agent exit by checking the last `N` lines for:\n   - Shell prompt returned (for example, a line ending in `$ `, `% `, or `> `)\n   - Exit indicators (`exit code`, `status <non-zero>`, `exited`)\n   - No completion marker (`__TASK_DONE__`)\n4. If crash is detected, run the agent-native resume command in the same tmux session.\n\nUse a done marker in your start command so the monitor can distinguish normal completion from crashes:\n\n```bash\ntmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto \"$(cat $TASK_TMPDIR/prompt)\" && echo \"__TASK_DONE__\"' Enter\n```\n\nFor Codex tasks, save the session ID to `$TMPDIR/codex-session-id` when the task starts (see **Codex CLI** above). The monitor reads that file to resume the exact task session.\n\nThe orchestrator should run this check loop periodically (every 3-5 minutes, via cron or a background timer). On consecutive failures, double the interval (3m, 6m, 12m, ...) and reset when the agent is running normally. Stop after 5 hours wall-clock.\n\n## Recovery After Interruption\n\nFor automated crash detection and retries, use **Health Monitoring** above.\nKeep this section as a manual fallback when you need to intervene directly:\n\n```bash\n# Codex (prefer explicit session ID from $TMPDIR/codex-session-id)\ntmux send-keys -t codex-<task-name> 'codex exec resume <session-id> \"Continue the previous task\"' Enter\n\n# Claude Code\ntmux send-keys -t claude-<task-name> 'claude --resume' Enter\n\n# OpenCode\ntmux send-keys -t opencode-<task-name> 'opencode run \"Continue\"' Enter\n\n# Pi: no native resume; re-run the task prompt manually\n```\n\n## Cleanup\n\nAfter a task completes, kill the tmux session:\n\n```bash\ntmux kill-session -t codex-<task-name>\n```\n\nList all coding agent tmux sessions:\n\n```bash\ntmux list-sessions 2>/dev/null | grep -E '^(codex|claude|opencode|pi)-'\n```\n\n## Naming Convention\n\nTmux sessions use the pattern `<agent>-<task-name>`:\n\n- `codex-refactor-auth`\n- `claude-review-pr-42`\n- `codex-bus-sim-physics`\n\nKeep names short, lowercase, hyphen-separated.\n\n## Checklist\n\nBefore starting a long task:\n\n1. Pick tmux over direct execution (if task > 5 min)\n2. Name the tmux session with the agent prefix\n3. Optionally chain a completion notification\n4. Tell the user: task content, tmux session name, estimated duration\n5. Monitor via `tmux capture-pane` on request\n\n## Limitations\n\n- tmux sessions do not survive a **machine reboot** (tmux itself is killed). For reboot-resilient tasks, the coding agent's native resume (`codex exec resume <session-id>`, `claude --resume`) is the recovery path.\n- Interactive approval prompts inside tmux require manual `tmux attach` or `tmux send-keys`. Use `--full-auto` / `--yolo` / `-p` flags when possible.\n","topics":["Coding"],"tags":{"latest":"0.5.0"},"stats":{"comments":0,"downloads":497,"installsAllTime":18,"installsCurrent":1,"stars":0,"versions":11},"createdAt":1771200607204,"updatedAt":1778491552996},"latestVersion":{"version":"0.5.0","createdAt":1771379058267,"changelog":"Release v0.5.0","license":null},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"cosformula","userId":"s1779ws25xpwzg5wt17m8htzt1884ce9","displayName":"cosformula","image":"https://avatars.githubusercontent.com/u/18232501?v=4"},"moderation":null}