Shark

Other

Enables non-blocking AI agent execution by spawning parallel remora subagents for slow tasks, keeping the main agent responsive and efficient.

Install

openclaw skills install shark

🦈 The Shark Pattern

A shark that stops swimming dies. An agent that waits for tools wastes compute.

Works with: Claude Code · Codex · Gemini CLI · Cursor · Windsurf · Aider · OpenClaw · any LLM agent

When to Use This Skill

Trigger this skill when the user says:

"use the shark pattern"
"non-blocking agent"
"never wait for tools"
"spawn background workers"
"parallel subagents"
"keep the main agent moving"
or when you notice you're about to block on a slow tool (web fetch, SSH, build, test run, API call)

The Rule

Every LLM turn must complete in under 30 seconds.

If any operation would take longer:

Spawn a remora (sessions_spawn with mode: "run")
Continue reasoning immediately
Incorporate remora results when they arrive

You are never in I/O wait. You are always reasoning about something.

Lifecycle

┌─────────────┐
│  DECOMPOSE  │  Break task into N independent subtasks
└──────┬──────┘
       │ spawn N remoras (+ 1 pilot fish when first completes early)
       ▼
┌─────────────┐
│    SPAWN    │  sessions_spawn × N, all parallel, record session IDs
└──────┬──────┘
       │ main agent keeps reasoning (never waits)
       ▼
┌─────────────┐     timeout/crash
│   MONITOR   │ ──────────────────► MARK ⏱/❌ (partial still useful)
└──────┬──────┘
       │ all done OR deadline hit
       ▼
┌─────────────┐
│  AGGREGATE  │  Collect results, note failures, merge pilot fish draft
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   REPORT    │  Single coherent response with failure count noted
└─────────────┘

No nested remoras. If a remora is running, it executes inline — remoras cannot spawn their own remoras. Only the main shark spawns.

The Pattern

Bad (Ralph-style blocking):

think → call slow tool → WAIT 60s → think → call slow tool → WAIT 45s → ...

Good (Shark-style non-blocking):

think → spawn remora(slow tool) → think about something else
     → spawn remora(another tool) → synthesize partial results
     → receive remora result → incorporate → swim on

Implementation

When applying the Shark Pattern, structure your work like this:

1. Identify blocking operations

Before calling any tool, ask: "Will this take more than 20-30 seconds?"

Slow tools (always spawn):

Web searches / page fetches
SSH commands on remote machines
Build / test / CI runs
File system scans over large directories
API calls with unknown latency
LLM inference calls (coding agents)

Fast tools (run inline, never spawn):

Reading local files
Simple calculations
String manipulation
Memory lookups

2. Spawn remoras

sessions_spawn({
  task: "Do the slow thing and return the result",
  mode: "run",
  runtime: "subagent",
  streamTo: "parent"  // optional: stream output back
})

Spawn multiple remoras in parallel when possible — don't serialize unless there's a data dependency.

3. Keep the main fin moving

After spawning, immediately continue:

Plan the next step
Work on a different part of the task
Summarize what you know so far
Prepare to incorporate results

4. Incorporate results

When remora results arrive, weave them in and continue. Never re-do work a remora already completed.

If your runtime keeps subagents alive after completion, close them once you've incorporated their result. In Codex that means: wait for the remora, use its output, then close_agent(id) unless you intentionally plan to reuse that same agent.

Timing Budget

Operation	Budget	Action
File read	< 2s	Inline
Web search	5-30s	Spawn
SSH command	10-120s	Spawn
Build/test	30-300s	Spawn
Coding agent	60-600s	Spawn
Memory search	< 3s	Inline

Example: Multi-Step Research Task

Without Shark (blocking):

1. Search web for X        [wait 15s]
2. Search web for Y        [wait 12s]  
3. Fetch page Z            [wait 8s]
4. SSH check server        [wait 30s]
Total: ~65 seconds blocked

With Shark (non-blocking):

1. Spawn: search X         [0s - spawned]
2. Spawn: search Y         [0s - spawned]
3. Spawn: fetch Z          [0s - spawned]
4. Spawn: SSH check        [0s - spawned]
5. Plan synthesis while waiting [15s of actual thinking]
6. All results arrive → synthesize
Total: ~15s of thinking + max(tool times) in parallel

Output Format

Announce on start

🦈 Shark mode — spawning [N] remoras for [tasks], continuing...

Progress bar (chat-friendly, Unicode only — no images needed)

Use this format after each remora or pilot fish completes. Works in Telegram, Discord, Signal, iMessage — anywhere.

🦈 3 remoras · 1 pilot fish

◉ [A] task name here    ████████████ ✅ 9s
◉ [B] task name here    ████████████ ✅ 33s
○ [C] task name here    ░░░░░░░░░░░░ pending
◈ [P] Pilot fish        ██████░░░░░░ ~14s left

↳ continuing...

Symbols:

◉ = remora (completed)
○ = remora (pending)
⊙ = remora (running)
◈ = pilot fish (time-bounded)
████████████ = done bar (12 blocks)
██████░░░░░░ = partial (filled = elapsed / total budget)
░░░░░░░░░░░░ = not started

Progress fill: filled = round(elapsed / timeout * 12) blocks of █, remainder ░

Only post an update when something changes (remora completes or pilot fish starts/ends). Don't spam — one update per event.

Final synthesis

After all remoras done:

🦈 All fins in — synthesising [N] results + pilot draft

Then deliver the report.

The Pilot Fish Sub-Pattern

Pilot fish swim alongside sharks doing prep work. When you have idle time, use it.

When one remora returns early and others are still running:

Spawn a pilot fish — a time-bounded analysis sub-agent
Give it only the partial results so far + a hard timeout equal to the estimated remaining wait
Let it pre-validate, pre-analyse, find patterns, draft conclusions
Kill it (or it self-terminates) when the last primary remora completes
Incorporate whatever the pilot fish produced into the final synthesis

remora A ──────► result (early)
remora B ────────────────────────────► result
remora C ──────────────────────────────────► result

main:   spawn A, B, C
        A done → spawn pilot-fish(A's result, timeout=est_remaining)
        pilot-fish: pre-analyse A, draft partial report, validate data...
        B done → pilot-fish still running, feed B's result in (or kill+reuse)
        C done → kill pilot-fish, synthesise A+B+C+pilot-fish draft

Pilot Fish Rules

Always time-bounded — pass runTimeoutSeconds equal to estimated remaining wait
Never blocks — spawned async, main agent continues
Opportunistic — if it finishes early, bonus; if killed mid-run, partial output is still useful
One at a time — don't stack pilot fish on pilot fish
Task: pre-validate data, find gaps, draft structure, flag anomalies, prepare questions

Example

// remoras A (fast) and B (slow) both spawned
// A finishes in 10s, B will take another 30s

// Spawn pilot fish with 25s budget:
sessions_spawn({
  task: "Pre-analyse these results from remora A. 
         Validate the data, note any gaps, draft the structure 
         of the final report. Stop after 25 seconds.",
  runTimeoutSeconds: 25,
  mode: "run"
})

// Main agent continues doing other work
// When B finishes → kill pilot fish → synthesise A + B + pilot draft

Decision Tree — When to Spawn

Before every tool call, ask: "Will this take more than 10 seconds?"

Estimated time < 10s?  → run inline
Estimated time ≥ 10s?  → spawn remora
Unknown latency?        → spawn remora (assume slow)
Data dependency on another remora? → wait, then inline
Already at 8 remoras? → queue, don't stack

Always spawn: web search/fetch, SSH, build/test, coding agents, CI triggers, API calls with unknown latency Always inline: file read, memory lookup, string ops, math, local config reads

Error Handling

remoras will fail, timeout, or return garbage. Plan for it.

remora timeout

◉ [A] task    ████████████ ⏱ 30s [timeout]

Treat as partial result — use whatever was returned
Do not re-spawn the same task (wastes time, likely to timeout again)
Note the gap in synthesis: "A timed out — data may be incomplete"
If A's result is critical, spawn a smaller-scoped follow-up shark

remora crash / error

◉ [A] task    ████████████ ❌ [error: connection refused]

Log the error inline in the progress bar
Continue synthesis without that result
Mention the failure in the final report
Optionally file an issue / alert if it's infrastructure
If the runtime still shows the remora as open after completion or error, clean it up immediately. In Codex, close completed remoras with close_agent(id) once their output is delivered.

Partial results (most common)

Most useful — a remora that timed out at 28s has 28s of work in it
Always check if partial output is usable before discarding
Progress bar: ⏱ = timeout with partial, ❌ = hard error with nothing

>50% remoras failed

Degrade gracefully — fall back to sequential for remaining work
Note in report: "⚠️ degraded mode — N/M remoras failed"

All remoras failed

Fall back to sequential execution for the most critical task only
Do not spawn another full fleet — you're likely hitting a systemic issue

Forgetting to spawn the pilot fish (most common mistake)

You finished a fast inline task, a remora is still running, and you just... wait
Symptom: main agent idle, no pilot fish, time wasted
Fix: always ask after any remora completes early — "what can I pre-draft right now?"
Even if you have nothing obvious, draft the output structure, prepare questions, or outline the synthesis

Pilot fish killed mid-run

Normal and expected — whatever it produced is still useful
Incorporate partial pilot fish output into synthesis
Don't wait for it or re-spawn it

Terminology

remora = a sessions_spawn call with mode: "run", runtime: "subagent", and runTimeoutSeconds set. A remora is specifically a timed sub-agent — untimed subagents are not remoras.
Pilot fish = a remora spawned after another remora completes, with a short timeout sized to the estimated remaining wait. Purpose: pre-analysis only, never primary work.
Fleet = the full set of remoras spawned for one task
Fin moving = the main agent is doing useful work (not waiting)
No nested remoras = remoras always execute inline — only the main shark spawns

`runTimeoutSeconds` — confirmed real

Verified against OpenClaw source: runTimeoutSeconds: z.number().int().min(0).optional() — maps to the subagent wait timeout. Use it. Hard-kills the sub-agent process after N seconds, partial output returned.

Pilot Fish Sizing Formula

pilotFishTimeout = min(estimatedRemaining * 0.8, 25)

estimatedRemaining = how long you think the slowest remaining remora will take
Cap at 25s so pilot fish always finishes before the main synthesis turn
If you don't know: use 20s as default

Example: slowest remaining remora estimated at 30s → pilot fish timeout = min(24, 25) = 24s

Hard Limits

Never use yieldMs > 30000 in exec calls — this holds the main turn hostage
Never process(action=poll, timeout > 20000) in the main session — same reason
Never add sleep or wait loops in the main thread
Always set runTimeoutSeconds on remoras — unbound sub-agents are not sharks
Always clean up completed remoras — if your runtime requires explicit teardown, do it right after incorporating the result
Max 8 concurrent remoras — beyond this, context overhead exceeds the gain
Never stack pilot fish — one at a time, no pilot fish spawning pilot fish
Spawn tasks ≤ 3 sentences — longer task descriptions need decomposition first

Enforcing the 30-Second Timeout

The 30s cap isn't just a guideline — here's how to actually enforce it per runtime.

OpenClaw subagents

sessions_spawn({
  task: "...",
  mode: "run",
  runtime: "subagent",
  runTimeoutSeconds: 30   // hard kill after 30s — agent gets SIGTERM
})

runTimeoutSeconds is enforced by the OpenClaw runtime — the sub-agent process is killed if it exceeds it. Partial output is still returned.

exec calls (shell, SSH, scripts)

exec({
  command: "some-slow-command",
  timeout: 30,        // hard kill in seconds
  background: true,   // don't block the main agent turn
  yieldMs: 500        // poll back quickly to check
})

timeout kills the process. background: true means the main agent doesn't wait — it gets a session handle and can check back with process(poll).

Gemini CLI via exec

timeout 30 gemini -p "task here"
# or on Windows:
Start-Process gemini -ArgumentList '-p "task"' -Wait -Timeout 30

Wrap the CLI invocation with OS-level timeout / Start-Process -Timeout.

Pilot fish — always use `runTimeoutSeconds`

sessions_spawn({
  task: "pre-analyse partial results, draft structure, flag gaps",
  mode: "run",
  runTimeoutSeconds: estimatedRemainingMs / 1000,  // die before the last remora
})

Set it to slightly less than your estimated remaining wait — so the pilot fish always finishes before you need to synthesise.

What happens when timeout fires

Sub-agent/process is killed
Whatever output was produced so far is returned
Main agent treats it as a partial result — still useful for synthesis
Log: [timeout] in the progress bar instead of ✅

⊙ [A] slow task    ████████████ ⏱ 30s [timeout — partial result]

The LLM turn itself

You can't hard-kill an LLM mid-turn, but you can:

Keep prompts tight — don't ask for exhaustive analysis in one turn
Use thinking: "none" for fast sub-tasks that don't need deep reasoning
Break large tasks into smaller shark-able chunks upfront

Rule of thumb: if a task description is >3 sentences, it probably needs to be split into remoras.

Compatibility — Claude, Codex, Gemini CLI

The Shark Pattern is runtime-agnostic. remoras can be any agent type.

OpenClaw (Claude / Sonnet / Opus)

sessions_spawn({
  task: "...",
  mode: "run",
  runtime: "subagent",
  runTimeoutSeconds: 30   // hard cap for pilot fish
})

Codex

sessions_spawn({
  task: "...",
  runtime: "acp",
  agentId: "codex",
  mode: "run",
  runTimeoutSeconds: 30
})

Codex-specific lifecycle:

Spawn with spawn_agent(...) or the runtime-equivalent remora launcher
Check completion with wait_agent(...)
If you want to reuse the same remora, send more work with send_input(...)
Otherwise, once the remora has completed and you've incorporated its result, call close_agent(id) so the agent does not linger in the session

Gemini CLI

Gemini CLI is a local process — spawn via exec with a timeout:

exec({
  command: "gemini -p \"task description here\"",
  timeout: 30,            // hard cap in seconds
  background: true,       // don't block main agent
  yieldMs: 500            // check back quickly
})

For Gemini sub-tasks, use exec with timeout + background: true rather than sessions_spawn. Treat the process handle the same way — continue working, collect output when it lands.

Mixed fleets

You can mix runtimes in the same shark run:

spawn remora A → Codex (coding task)
spawn remora B → Gemini (web search / analysis)
spawn remora C → Claude subagent (reasoning)
spawn pilot fish  → Claude subagent (pre-analysis, time-bounded)

Which to use when

Task type	Best runtime
Code generation / editing	Codex
Web search / summarise	Gemini CLI
Multi-step reasoning	Claude subagent
File ops / SSH / shell	exec (background)
Pre-analysis / drafting	Claude subagent (pilot fish)

shark-exec Sub-Skill

For slow shell commands (>5s), use the shark-exec companion skill:

Located at shark-exec/SKILL.md in this repo
Wraps any exec call in background + cron poller
Guarantees main turn completes in <30s even for 10-minute commands
Use it instead of inline exec whenever the command might block

Loop Enforcement (Ralph-style)

The 30-second rule is best enforced at the shell level, not inside a turn.

Use shark.sh (or shark.ps1 on Windows) to run Claude in a bounded loop:

./shark.sh "find the latest ChatterPC version, check pve3, summarise GitHub issues"

Each iteration:

Builds a fresh prompt: skill context + task + current state
Runs claude --print with a hard timeout 25s shell wrapper
If Claude times out → loop continues (it's expected — shark pattern means short turns)
If Claude writes .shark-done → loop exits

This is identical to the Ralph Loop pattern, but with the Shark Pattern as the prompt — Claude spawns remoras for slow work, keeps each turn under 25s, and the shell loop enforces the hard cut.

When to use the loop vs direct claude

Use case	Approach
Single fast task (<30s total)	`claude --print "..."` directly
Multi-step task, slow tools	`./shark.sh "..."` loop
CI/build watching	shark-exec (background + cron)
Interactive chat	OpenClaw main session

Environment variables

Variable	Default	Description
`SHARK_MAX_LOOPS`	`50`	Maximum iterations before giving up
`SHARK_LOOP_TIMEOUT`	`25`	Per-turn timeout in seconds (hard kill)

Completion protocol

When Claude determines the task is done, it writes to .shark-done:

TASK_COMPLETE
<brief summary of what was accomplished>

The loop detects this file and exits cleanly.

Commands

When the user invokes these commands, follow the instructions for each.

`/shark <task>`

Apply the Shark Pattern to the given task. Decompose, spawn remoras for slow ops, keep the main fin moving. Follow all rules in this SKILL.md.

`/shark-loop <task> [--max-loops N] [--timeout S]`

Run the external shark loop enforcer. Execute:

$env:SHARK_MAX_LOOPS = "<N>"
$env:SHARK_LOOP_TIMEOUT = "<S>"
powershell.exe -ExecutionPolicy Bypass -File "<skill_dir>/shark.ps1" "<task>"

Defaults: --max-loops 50, --timeout 25. On Linux/Mac use shark.sh instead.

`/shark-status`

Check current shark state:

Read <skill_dir>/shark-exec/state/pending.json — report active background jobs (label, command, elapsed time, whether overdue past maxSeconds)
If .shark-done exists, show its contents
If SHARK_LOG.md exists, show the last 10 lines
If nothing exists, report "No active shark jobs."

`/shark-clean`

Remove shark state files: .shark-done, SHARK_LOG.md, shark-exec/state/pending.json. Report what was cleaned.

`/shark-autotune`

Analyse timing history and recommend optimal settings.

Read <skill_dir>/state/timings.jsonl — each line is:

{"ts":1710000000,"loop":1,"elapsed_s":12.3,"timeout_s":25,"result":"ok|timeout|done","task_hash":"abc123"}

If no data, report "No timing data yet. Run tasks with /shark first."
Compute and report:
- Total runs (unique task_hash values) and total loops
- Median turn time (p50) and p95 turn time
- Timeout rate — % of turns with result "timeout"
- Loops to completion — median and max (count loops per task_hash that has a "done" entry)
- Wasted headroom — sum of (timeout_s - elapsed_s) for result "ok" turns
- Optimal timeout — p95 turn time + 3s buffer, rounded up to nearest 5s
- Optimal max_loops — p95 loops-to-completion + 2

Show recommendations:

Current:     SHARK_LOOP_TIMEOUT=25  SHARK_MAX_LOOPS=50
Recommended: SHARK_LOOP_TIMEOUT=N   SHARK_MAX_LOOPS=M

Rationale:
- p95 turn time is Xs, so timeout of Ns covers 95% with buffer
- p95 completion is N loops, so max_loops of M gives safe margin
- Timeout rate is X% — [>15%: consider splitting tasks | healthy]
- Wasted headroom: Xs total

If timeout rate > 30%: "Consider breaking tasks into smaller steps."
If median turn time < 5s: "Most turns complete fast. Consider lowering timeout."

Timing Instrumentation

Both shark.sh and shark.ps1 automatically record per-loop timings to state/timings.jsonl. Each entry includes:

ts — Unix timestamp
loop — loop iteration number
elapsed_s — actual wall-clock seconds for this turn
timeout_s — configured timeout for this run
result — "ok" (completed), "timeout" (hit limit), "done" (task finished)
task_hash — 8-char hash correlating loops within a single run

Use /shark-autotune to analyse this data and tune your settings.

References

Ralph Loop (sequential baseline): ghuntley.com/ralph/
OpenClaw sessions_spawn docs: spawn with mode: "run", runtime: "subagent"
Gemini CLI: npm install -g @google/gemini-cli
The name: sharks use ram ventilation — they literally die if they stop moving