Install
openclaw skills install resilient-coding-agentRun long-running coding agents (Codex, Claude Code, etc.) in tmux sessions that survive orchestrator restarts, with automatic resume on interruption.
openclaw skills install resilient-coding-agentLong-running coding agent tasks (Codex CLI, Claude Code, OpenCode, Pi) are vulnerable to interruption: orchestrator restarts, process crashes, network drops. This skill decouples the coding agent process from the orchestrator using tmux, and leverages agent-native session resume for recovery.
Placeholders: <task-name> and <project-dir> are filled in by the orchestrator. <task-name> must match [a-z0-9-] only. <project-dir> must be a valid existing directory.
Temp directory: Each task uses a secure temp directory created with mktemp -d. Store this path as <tmpdir> and use it for all task files (prompt, events, session ID, done marker). This avoids predictable filenames and symlink/race conditions. Example: TMPDIR=$(mktemp -d) produces something like /var/folders/xx/.../T/tmp.aBcDeFgH.
Prompt safety: Task prompts are never interpolated into shell commands. Instead, write the prompt to a temp file using the orchestrator's write tool (no shell involved), then reference it with "$(cat $TMPDIR/prompt)" inside the tmux command. The shell treats command substitution output inside double quotes as a single literal argument, preventing injection. This depends on the orchestrator's write tool not invoking a shell; OpenClaw's built-in write tool meets this requirement.
Sensitive output: tmux scrollback and event log files may contain secrets or API keys from agent output. On shared machines, restrict file permissions (chmod 600) and clean up temp directories after task completion.
This skill assumes the orchestrator is already configured to use coding agent CLIs (Codex, Claude Code, etc.) for coding tasks instead of native sessions. If the orchestrator is still using sessions_spawn for coding work, configure it to prefer coding agents first (e.g., via AGENTS.md or equivalent). See the coding-agent skill for setup.
Use this pattern when:
For quick tasks under 5 minutes, running the agent directly is fine.
Create a tmux session with a descriptive name. Use the agent prefix (codex-, claude-, etc.) for easy identification.
# Step 1: Create secure temp directory
TMPDIR=$(mktemp -d)
chmod 700 "$TMPDIR"
# Step 2: Write prompt to file (use orchestrator's write tool, not echo/shell)
# File: $TMPDIR/prompt
# Step 3: Launch in tmux (pass TMPDIR via env)
tmux new-session -d -s codex-<task-name> -e "TASK_TMPDIR=$TMPDIR"
tmux send-keys -t codex-<task-name> 'cd <project-dir> && set -o pipefail && codex exec --full-auto --json "$(cat $TASK_TMPDIR/prompt)" | tee $TASK_TMPDIR/events.jsonl && echo "__TASK_DONE__"' Enter
# Step 4: Capture this task's Codex session ID; resume --last is unsafe with concurrent tasks.
# Uses jq for reliable JSON parsing (falls back to grep if jq unavailable).
until [ -s "$TMPDIR/codex-session-id" ]; do
if command -v jq &>/dev/null; then
jq -r 'select(.thread_id) | .thread_id' "$TMPDIR/events.jsonl" 2>/dev/null | head -n 1 > "$TMPDIR/codex-session-id"
else
grep -oE '"thread_id":"[^"]+"' "$TMPDIR/events.jsonl" 2>/dev/null | head -n 1 | cut -d'"' -f4 > "$TMPDIR/codex-session-id"
fi
sleep 1
done
# Create secure temp directory and write prompt to $TMPDIR/prompt first
TMPDIR=$(mktemp -d) && chmod 700 "$TMPDIR"
tmux new-session -d -s claude-<task-name> -e "TASK_TMPDIR=$TMPDIR"
tmux send-keys -t claude-<task-name> 'cd <project-dir> && claude -p "$(cat $TASK_TMPDIR/prompt)" && echo "__TASK_DONE__"' Enter
# Create secure temp directory and write prompt to $TMPDIR/prompt first
TMPDIR=$(mktemp -d) && chmod 700 "$TMPDIR"
# OpenCode
tmux new-session -d -s opencode-<task-name> -e "TASK_TMPDIR=$TMPDIR"
tmux send-keys -t opencode-<task-name> 'cd <project-dir> && opencode run "$(cat $TASK_TMPDIR/prompt)" && echo "__TASK_DONE__"' Enter
# Pi (separate temp dir)
TMPDIR=$(mktemp -d) && chmod 700 "$TMPDIR"
tmux new-session -d -s pi-<task-name> -e "TASK_TMPDIR=$TMPDIR"
tmux send-keys -t pi-<task-name> 'cd <project-dir> && pi -p "$(cat $TASK_TMPDIR/prompt)" && echo "__TASK_DONE__"' Enter
Chain a notification command after the agent so you know when it finishes. Use ; before echo "__TASK_DONE__" so the marker prints even if the notification command fails:
# Generic: touch a marker file
tmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto "$(cat $TASK_TMPDIR/prompt)" && touch $TASK_TMPDIR/done; echo "__TASK_DONE__"' Enter
# macOS: system notification
tmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto "$(cat $TASK_TMPDIR/prompt)" && osascript -e "display notification \"Task done\" with title \"Codex\""; echo "__TASK_DONE__"' Enter
# OpenClaw: system event (immediate wake)
tmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto "$(cat $TASK_TMPDIR/prompt)" && openclaw system event --text "Codex done: <task-name>" --mode now; echo "__TASK_DONE__"' Enter
# Check if the session is still running
tmux has-session -t codex-<task-name> 2>/dev/null && echo "running" || echo "finished/gone"
# Read recent output (last 200 lines)
tmux capture-pane -t codex-<task-name> -p -S -200
# Read the full scrollback
tmux capture-pane -t codex-<task-name> -p -S -
Check progress when:
For long-running tasks, use an active monitor loop instead of only checking on demand.
Periodic check flow:
tmux has-session -t <agent-task> to confirm the tmux session still exists.tmux capture-pane -t <agent-task> -p -S -<N> to capture recent output.N lines for:
$ , % , or > )exit code, status <non-zero>, exited)__TASK_DONE__)Use a done marker in your start command so the monitor can distinguish normal completion from crashes:
tmux send-keys -t codex-<task-name> 'cd <project-dir> && codex exec --full-auto "$(cat $TASK_TMPDIR/prompt)" && echo "__TASK_DONE__"' Enter
For Codex tasks, save the session ID to $TMPDIR/codex-session-id when the task starts (see Codex CLI above). The monitor reads that file to resume the exact task session.
The orchestrator should run this check loop periodically (every 3-5 minutes, via cron or a background timer). On consecutive failures, double the interval (3m, 6m, 12m, ...) and reset when the agent is running normally. Stop after 5 hours wall-clock.
For automated crash detection and retries, use Health Monitoring above. Keep this section as a manual fallback when you need to intervene directly:
# Codex (prefer explicit session ID from $TMPDIR/codex-session-id)
tmux send-keys -t codex-<task-name> 'codex exec resume <session-id> "Continue the previous task"' Enter
# Claude Code
tmux send-keys -t claude-<task-name> 'claude --resume' Enter
# OpenCode
tmux send-keys -t opencode-<task-name> 'opencode run "Continue"' Enter
# Pi: no native resume; re-run the task prompt manually
After a task completes, kill the tmux session:
tmux kill-session -t codex-<task-name>
List all coding agent tmux sessions:
tmux list-sessions 2>/dev/null | grep -E '^(codex|claude|opencode|pi)-'
Tmux sessions use the pattern <agent>-<task-name>:
codex-refactor-authclaude-review-pr-42codex-bus-sim-physicsKeep names short, lowercase, hyphen-separated.
Before starting a long task:
tmux capture-pane on requestcodex exec resume <session-id>, claude --resume) is the recovery path.tmux attach or tmux send-keys. Use --full-auto / --yolo / -p flags when possible.