Install
openclaw skills install @windseeker1111/flow-swarmMulti-agent swarm orchestration via RuFlo + Claude Code. Turns single coding sessions into coordinated agent teams (architect/coder/tester/reviewer). Proven on 7 consecutive production runs generating 430+ tests across a 50K+ line Elixir codebase with 83% zero-iteration success rate. Features 150+ MCP tools for inter-agent coordination, persistent cross-run memory (sql.js + HNSW vectors), task tracking, file claim locks, and session persistence. Includes battle-tested prompt templates for test generation, feature builds, refactors, security audits, and quality loops. Setup script with 8-point verification. Works with any language, any codebase. NOT for one-liner edits or read-only tasks. Triggers: "swarm this", "flow swarm", "use the swarm", "flowswarm".
openclaw skills install @windseeker1111/flow-swarmMulti-agent swarm orchestration for Claude Code via RuFlo. One prompt, coordinated agents, production results.
Critical fix: MCP tools were disabled (autoStart: false). This prevented Claude Code from calling mcp__claude-flow__swarm_init, memory_store, agent_spawn, etc. during every swarm run we'd done so far. Fixed.
Also fixed: --print mode does not auto-discover .mcp.json. You must pass --mcp-config .mcp.json explicitly.
| Change | Why |
|---|---|
autoStart: true in .mcp.json | Was false — all 150+ MCP tools were disabled in every prior run |
--mcp-config .mcp.json flag added to exec pattern | --print doesn't auto-load project MCP config |
| MCP tool reference table added | 150+ tools now documented: swarm_init, agent_spawn, memory_store, etc. |
| Prompt templates updated to call MCP tools | Without explicit instructions, Claude may not use them |
| Setup script auto-fixes autoStart | ruflo init defaults to false; setup script corrects it |
v1.0 was theory. v2.0 is battle-tested across 5 production runs (355 tests, 5/5 green, 4/5 zero-iteration).
| Change | Why |
|---|---|
| Tiered task routing replaces one-size-fits-all | Pure-data modules don't need GenServer test isolation advice |
| Target selection protocol added | Picking the RIGHT module matters more than swarm config |
| Pre-flight context injection | Feeding the swarm grep output of public functions = dramatically better coverage |
| Daemon reality check | Workers timeout/fail often (20-50% success); swarm value comes from prompt orchestration, not daemon workers |
| Removed WASM Booster claims | Never observed in practice; hooks + prompt patterns drive all real value |
| Real performance data | Actual timing, test counts, iteration rates from production runs |
Two layers working together:
Layer 1: MCP Tools (150+ tools via @claude-flow/cli)
When autoStart: true in .mcp.json, Claude Code gets access to real coordination tools:
swarm_init — creates swarm with topology, persists to .claude-flow/swarm/swarm-state.jsonagent_spawn — registers agents with model routing (haiku/sonnet/opus/inherit)memory_store / memory_search — sql.js + HNSW vector embeddings for semantic recalltask_create / task_complete — tracks task state with assignmentsession_save / session_restore — persists session state between runsclaims_claim / claims_release — prevents agents from editing same filescoordination_consensus — multi-agent agreement on decisionsLayer 2: Prompt Orchestration (our FlowSwarm patterns) The SWARM MODE prefix causes Claude Code to think in roles (architect/coder/reviewer). Combined with pre-flight context injection (grepping public APIs), this produces 80% zero-iteration success.
Both layers matter. v1.0 had Layer 2 only (autoStart was false, MCP tools never loaded). v2.0 enables both.
OpenClaw → exec (background) → Claude Code
↓
MCP Server starts (autoStart: true)
150+ tools available via @claude-flow/cli
↓
SWARM MODE prompt → swarm_init tool called
agent_spawn × N → task_create → execute
↓
memory_store (findings) → task_complete
↓
Output + persisted state
ruflo --version # 3.5.x+
claude --version # Claude Code CLI
# Full setup: install RuFlo + register MCP + init project
./scripts/setup-flow-swarm.sh /path/to/project
# Verify
./scripts/setup-flow-swarm.sh --verify /path/to/project
Or manually:
npm install -g ruflo@latest
claude mcp add ruflo -- npx -y ruflo@latest mcp start
cd /path/to/project && ruflo init && ruflo memory init && ruflo daemon start
After ruflo init, the .mcp.json file defaults to autoStart: false. This disables ALL 150+ MCP tools during Claude Code sessions. Fix it:
# Check current state
python3 -c "import json; d=json.load(open('.mcp.json')); print('autoStart:', d['mcpServers']['claude-flow'].get('autoStart'))"
# Enable (REQUIRED for full swarm functionality)
python3 -c "
import json
with open('.mcp.json') as f: d = json.load(f)
d['mcpServers']['claude-flow']['autoStart'] = True
with open('.mcp.json', 'w') as f: json.dump(d, f, indent=2)
print('MCP autoStart enabled')
"
Without this, Claude Code runs without swarm tools. The prompt patterns still work (v1.0 proved this), but you lose: persistent swarm state, agent memory, task tracking, session persistence, and inter-agent coordination.
This is the highest-leverage decision. Pick wrong and you waste a swarm run.
Best targets (in order):
Find targets fast:
# List untested modules by size (biggest = best target)
for f in lib/**/*.ex; do
base=$(basename "$f" .ex)
count=$(find test/ -name "${base}_test.exs" 2>/dev/null | wc -l | tr -d ' ')
[ "$count" = "0" ] && echo "$(wc -l < "$f")L $f"
done | sort -rn | head -10
The secret sauce: feed the swarm a pre-flight scan of the module. Don't just say "test this file" — tell it exactly what functions exist, what patterns the project uses, what edge cases matter.
# Pre-flight: scan public API
grep -n "^ def " lib/your_module.ex
# Pre-flight: check existing test patterns
head -30 test/some_existing_test.exs
Then build the prompt with that intel baked in.
# Launch (ALWAYS background, NEVER nohup)
exec(
command='cd /project && claude --permission-mode bypassPermissions --mcp-config .mcp.json --print "SWARM MODE: ... TASK: ..."',
background=True,
timeout=300
)
# Poll for completion
process(action="poll", sessionId="xxx", timeout=120000)
# Verify the output actually compiles/passes
mix test test/path/to/new_test.exs
Critical exec rules:
--print buffers ALL output until exit. Use background: true + poll.nohup — Node.js stdout capture breaks silently (empty files).mix test (or equivalent) on swarm output before committing.Proven: 147/147, 66/66, 41/41 zero-iteration
Best for: static catalogs, type definitions, translation modules, config builders.
SWARM MODE: Initialize hierarchical swarm with MCP tools.
COORDINATION:
1. Call swarm_init with topology "hierarchical", maxAgents 4, strategy "specialized"
2. Call agent_spawn for: architect (analyze module), coder (write tests), reviewer (verify)
3. Call task_create for the test generation task
4. After completion: call memory_store with key findings and task_complete
TASK: Write comprehensive ExUnit tests for [MODULE_PATH] ([LINE_COUNT] lines, [DESCRIPTION]).
Public API:
[PASTE grep -n "^ def " output here]
Key data to validate:
- [List specific assertions: required struct keys, value ranges, URL formats, etc.]
- [List known edge cases: unknown inputs, nil, empty string, integer where string expected]
Requirements:
- File: test/[matching_path]_test.exs
- Use async: true (pure functions, no state)
- Group tests by function (describe blocks)
- Test ALL variants, not just a sample (e.g., all 8 hotels, not just 2)
- Include edge cases: nil input, empty string, unknown keys
- Do NOT modify any source files
When done: call memory_store with test count + key findings, then output results.
Proven: 43/43, required 3 iterations (test isolation)
SWARM MODE: Initialize hierarchical swarm with MCP tools.
COORDINATION:
1. Call swarm_init with topology "hierarchical", maxAgents 4, strategy "specialized"
2. Call agent_spawn for: architect (analyze GenServer behavior), coder (write tests), reviewer (verify)
3. Call task_create for the test generation task
4. After each iteration: call memory_store with what failed and why
5. After completion: call task_complete with final results
TASK: Write comprehensive ExUnit tests for [MODULE_PATH] ([LINE_COUNT] lines, GenServer).
Public API:
[PASTE grep output]
CRITICAL — Test Isolation for GenServers:
- The module registers as a named process (__MODULE__). It's already supervised globally.
- Do NOT use start_supervised! — it conflicts with the app-supervised instance.
- Pattern: stop the global instance, restart with test config, re-stop at end.
- OR: if start_link accepts a name: option, use unique names per test.
- async: false for GenServer tests that touch global state.
Requirements:
- Test GenServer lifecycle (init, handle_call, handle_cast, handle_info)
- Test crash recovery: missing catch-all handlers are REAL BUGS worth flagging
- Test state transitions and side effects
- Do NOT modify any source files
SWARM MODE: Initialize hierarchical swarm (maxAgents 4, strategy specialized).
Spawn: architect (analyze deps + plan mocks), coder (write tests), reviewer (verify coverage).
TASK: Write comprehensive ExUnit tests for [MODULE_PATH].
This module depends on: [LIST DEPENDENCIES]
Mock strategy: [Mox / manual mock / test config override]
Reference existing mocks in test/support/ if any.
Requirements:
- Mock all external calls (HTTP, DB, external services)
- Test happy path AND error paths (timeouts, 4xx, 5xx, malformed responses)
- async: true if using Mox with allowances
- Do NOT modify source files
SWARM MODE: Initialize hierarchical swarm (maxAgents 6, strategy specialized).
Spawn: architect (plan structure), coder (implement), tester (tests), reviewer (quality).
Architect plans FIRST. Coder implements. Tester validates. Reviewer catches issues.
TASK: [Feature description with clear acceptance criteria]
Architecture constraints:
- [List patterns to follow from existing codebase]
- [List modules/files to reference for conventions]
HARD LIMIT: Maximum 5 iterations if quality loop needed.
SWARM MODE: Initialize anti-drift hierarchical swarm (maxAgents 4).
Spawn: architect (plan + checkpoint), coder (execute), reviewer (validate each step).
ANTI-DRIFT RULES:
- Architect creates numbered plan before ANY code is written
- Coder implements ONE step at a time
- Reviewer validates EACH step before proceeding
- If reviewer rejects twice: STOP and report
- Checkpoint state after each successful step
TASK: [Refactor description]
HARD LIMIT: Maximum 8 iterations.
SWARM MODE: Security-focused hierarchical swarm (maxAgents 5, strategy specialized).
Spawn: security-architect (threat model), auditor (scan), coder (fix), tester (verify).
TASK: Security audit of [scope].
Checklist:
- [ ] Dependency vulnerabilities (mix audit / npm audit)
- [ ] Hardcoded secrets in source
- [ ] Injection vectors (SQL, XSS, command)
- [ ] Auth/authz bypass paths
- [ ] GenServer catch-all handlers (handle_info, handle_cast) — these are REAL BUGS
- [ ] Error messages leaking internal state
- [ ] Rate limiting gaps
- [ ] CORS/CSP headers
Output: findings table with severity, file, line, fix.
HARD LIMIT: Maximum 5 iterations.
SWARM MODE: Initialize ring swarm (maxAgents 4, strategy adaptive).
Spawn: coder, tester, reviewer, coordinator.
HARD LIMIT: Maximum 10 iterations.
TASK: Iterate on [target] until [score threshold].
Per iteration:
1. Coder fixes based on reviewer feedback
2. Tester runs full suite, reports pass/fail count
3. Reviewer scores against rubric
4. Score >= threshold → STOP, report final score
5. Iteration == 10 → STOP regardless, report score and remaining gaps
These tools become available to Claude Code during swarm sessions. Include instructions to USE them in your prompts.
| Tool | Purpose |
|---|---|
swarm_init | Create swarm with topology + strategy. Persists to .claude-flow/swarm/ |
swarm_status | Check swarm health mid-run |
swarm_shutdown | Clean shutdown with state persistence |
agent_spawn | Register agents with model routing (haiku/sonnet/opus) |
agent_status | Check individual agent state |
memory_store | Persist findings to sql.js + HNSW (semantic search) |
memory_search | Retrieve relevant context from prior runs |
task_create | Track task with assignment + status |
task_complete | Mark task done with summary |
| Tool | Purpose |
|---|---|
session_save | Save session state between runs |
session_restore | Resume from prior session |
claims_claim | Lock a file/resource (prevents agent conflicts) |
claims_release | Release lock |
coordination_consensus | Multi-agent agreement |
coordination_sync | Synchronize agent state |
| Tool | Purpose |
|---|---|
analyze_diff | Review code changes |
analyze_diff_risk | Assess risk of changes |
performance_report | Bottleneck detection |
Without autoStart: true, Claude Code has ZERO access to these tools. It runs on prompt intelligence alone (which works, as v1.0 proved). With them enabled, the swarm can:
| Topology | When | Track Record |
|---|---|---|
hierarchical | Test gen, features, refactors | 5/5 green tonight |
ring | Quality loops, pipelines | Proven in Ralph loops |
mesh | Research, exploration | Untested in production |
star | Simple delegation | Untested in production |
Default: hierarchical. It has the strongest anti-drift properties and all production wins used it.
| Module | Lines | Tests | Time | Iterations | Result |
|---|---|---|---|---|---|
| AssetHealthCheck (GenServer) | 288 | 43 | ~120s | 3 | 43/43 ✅ + found 2 real bugs |
| WidgetHotelAssets (data) | 2,017 | 147 | ~90s | 0 | 147/147 ✅ |
| SquadBuilder (config builder) | 889 | 58 | ~90s | 0 | 58/58 ✅ |
| I18n (translations) | 272 | 41 | ~60s | 0 | 41/41 ✅ |
| TravelClick Types (structs) | 637 | 66 | ~90s | 0 | 66/66 ✅ |
Totals: 355 new tests, 5/5 modules green, 4/5 zero-iteration (80%)
--print buffering, not swarm complexity).| Worker | Success Rate | Notes |
|---|---|---|
| map | 100% | Fast (1ms), just indexes project structure |
| consolidate | 100% | Fast (9ms), memory compaction |
| audit | 20% | Timeouts at 300s, falls back to local mode |
| optimize | 33% | Timeouts, deferred on high CPU load |
| testgaps | 50% | Deferred on high CPU load |
| predict | 0% | Disabled by default |
| document | 0% | Disabled by default |
Takeaway: Don't rely on daemon workers for task quality. The prompt pattern does the heavy lifting. Daemon adds marginal background value (map + consolidate work; audit/optimize are unreliable).
You can run multiple swarms simultaneously on different modules. Each gets its own exec session:
# Launch 2 parallel swarms
exec(command='cd /project && claude --mcp-config .mcp.json --print "SWARM: ... TASK: test module_a"', background=True)
exec(command='cd /project && claude --mcp-config .mcp.json --print "SWARM: ... TASK: test module_b"', background=True)
# Poll both
process(action="poll", sessionId="session-a", timeout=120000)
process(action="poll", sessionId="session-b", timeout=120000)
Observed: 2 parallel swarms work cleanly. 3+ may cause CPU load deferrals on daemon workers (irrelevant for prompt-driven value).
FlowSwarm can analyze and improve itself:
SWARM MODE: Initialize meta-analysis hierarchical swarm (maxAgents 4).
Spawn: architect (analyze skill files), analyst (review production data), coder (rewrite), reviewer (validate).
TASK: Analyze the FlowSwarm skill at [path] against production run data.
Review: what worked, what failed, what's missing. Generate v[N+1].
No output from Claude Code:
--print buffers until completion. Use background: true on exec, poll with generous timeout. Never use nohup.
Swarm didn't fire:
.claude/settings.json has hooksruflo doctorclaude mcp list | grep rufloruflo daemon stop && ruflo daemon startGenServer test isolation failures: The global supervised instance conflicts with test instances. Solutions:
name: :"test_#{System.unique_integer()}"async: false for stateful testsDaemon workers timing out:
Normal. Workers like audit and optimize timeout at 300s regularly (20-33% success rate). The swarm's value comes from prompt orchestration, not daemon workers. Ignore worker failures unless you specifically need their output.
Memory shows 0 entries: In v1.0 (autoStart: false), the MCP server never started so memory_store was never called. With v2.0 (autoStart: true), Claude Code can call memory_store directly. Check after a run:
ruflo memory stats
ruflo memory search -q "test results"
CPU load deferrals: Workers defer when system CPU > 8. This is protective. During active swarm runs, expect deferrals. Workers catch up when CPU drops.
skills/flow-swarm/
├── SKILL.md # This file (v2.0)
├── scripts/
│ └── setup-flow-swarm.sh # Install + init + verify
└── references/
└── template-examples.md # Extended templates with context
--print mode doesn't persist swarm memories