Install
openclaw skills install claw-superpowersAgentic software development methodology — 13 integrated skills for disciplined AI-assisted development covering brainstorming, planning, TDD, debugging, cod...
openclaw skills install claw-superpowersDisciplined, systematic approach to AI-assisted software development. 13 integrated skills covering the full development lifecycle.
Apply the relevant section based on your current task:
| Task | Section |
|---|---|
| Starting new work | Section 2 (Brainstorming) → Section 3 (Writing Plans) |
| Implementing a plan (this session) | Section 5 (Subagent-Driven Development) |
| Implementing a plan (separate session) | Section 4 (Executing Plans) |
| Multiple independent problems | Section 6 (Dispatching Parallel Agents) |
| Writing or fixing code | Section 7 (Test-Driven Development) |
| Bug, test failure, unexpected behavior | Section 8 (Systematic Debugging) |
| About to claim work is done | Section 9 (Verification Before Completion) |
| After completing work | Section 10 (Code Review) |
| Need isolated workspace | Section 11 (Using Git Worktrees) |
| Ready to merge/integrate | Section 12 (Finishing a Development Branch) |
| Creating new skills | Section 13 (Writing Skills) |
Skill priority: Process skills first (brainstorming, debugging), then implementation skills.
Skill types:
IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
This is not negotiable. This is not optional. You cannot rationalize your way out of this. </EXTREMELY-IMPORTANT>
Rule: Invoke relevant skills BEFORE any response or action. Even a 1% chance a section might apply means you should check.
Process:
Priority order:
User instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows.
| Thought | Reality |
|---|---|
| "This is just a simple question" | Questions are tasks. Check for skills. |
| "I need more context first" | Skill check comes BEFORE clarifying questions. |
| "Let me explore the codebase first" | Skills tell you HOW to explore. Check first. |
| "I can check git/files quickly" | Files lack conversation context. Check for skills. |
| "Let me gather information first" | Skills tell you HOW to gather information. |
| "This doesn't need a formal skill" | If a skill exists, use it. |
| "I remember this skill" | Skills evolve. Read current version. |
| "This doesn't count as a task" | Action = task. Check for skills. |
| "The skill is overkill" | Simple things become complex. Use it. |
| "I'll just do this one thing first" | Check BEFORE doing anything. |
| "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
| "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. |
Use before: Any creative work — creating features, building components, adding functionality, or modifying behavior.
<HARD-GATE> Do NOT write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity. </HARD-GATE>Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
docs/plans/YYYY-MM-DD-<topic>-design.md and commitWrite validated design to docs/plans/YYYY-MM-DD-<topic>-design.md, commit, then follow Section 3 (Writing Plans).
The terminal state is invoking writing-plans. Do NOT invoke any other implementation skill. The ONLY next step after brainstorming is writing-plans.
Use when: You have a spec or requirements for a multi-step task, before touching code.
Write comprehensive implementation plans with bite-sized tasks. Assume the implementer has zero context. Document exact file paths, complete code, exact commands with expected output.
Announce at start: "I'm using the writing-plans skill to create the implementation plan."
Save plans to: docs/plans/YYYY-MM-DD-<feature-name>.md
Every plan MUST start with:
# [Feature Name] Implementation Plan
> **For agent:** REQUIRED SUB-SKILL: Use Section 4 or Section 5 to implement this plan.
**Goal:** [One sentence]
**Architecture:** [2-3 sentences]
**Tech Stack:** [Key technologies]
Each step is one action (2-5 minutes):
### Task N: [Component Name]
**Files:**
- Create: `exact/path/to/file.py`
- Modify: `exact/path/to/existing.py:123-145`
- Test: `tests/exact/path/to/test.py`
**Step 1:** Write failing test (with complete code)
**Step 2:** Run test, verify fails (exact command + expected output)
**Step 3:** Write minimal implementation (complete code)
**Step 4:** Run test, verify passes (exact command + expected output)
**Step 5:** Commit (exact git commands)
After saving plan, offer:
"Plan complete and saved. Two execution options:
1. Subagent-Driven (this session) — fresh subagent per task, review between tasks, fast iteration → follow Section 5
2. Parallel Session (separate) — open new session, batch execution with checkpoints → follow Section 4
Which approach?"
Use when: You have a written implementation plan to execute in a separate session with review checkpoints.
Announce at start: "I'm using the executing-plans skill to implement this plan."
STOP executing immediately when:
Ask for clarification rather than guessing.
Return to Review (Step 1) when:
Don't force through blockers — stop and ask.
Use when: Executing implementation plans with independent tasks in the current session. Fresh subagent per task + two-stage review.
Core principle: Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
Provide: full task text, context (where it fits), working directory. Tell them:
Provide: full task requirements, implementer's report. Tell them:
Use code review template (see Section 10). Provide: BASE_SHA, HEAD_SHA, description.
Use when: 2+ independent tasks that can be worked on without shared state or sequential dependencies.
Don't use when:
Good prompts are: focused (one problem domain), self-contained (all context needed), specific about output (what to return).
Scenario: 6 test failures across 3 files after major refactoring
Dispatch:
Results: All fixes independent, no conflicts, full suite green. 3 problems solved in time of 1.
Use when: Implementing any feature or bugfix, before writing implementation code.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over. No exceptions:
Implement fresh from tests. Period.
Write one minimal test showing what should happen.
Good:
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
Clear name, tests real behavior, one thing.
Bad:
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
await retryOperation(mock);
expect(mock).toHaveBeenCalledTimes(2);
});
Vague name, tests mock not code.
Requirements: One behavior. Clear name. Real code (no mocks unless unavoidable).
MANDATORY. Never skip.
Run test. Confirm: fails (not errors), expected failure message, fails because feature missing.
Write simplest code to pass the test. Don't add features, refactor, or "improve" beyond the test.
Good: Just enough to pass. Bad: Over-engineered with options/backoff/callbacks the test doesn't require (YAGNI).
MANDATORY.
Run test. Confirm: passes, other tests still pass, output pristine (no errors, warnings).
After green only: remove duplication, improve names, extract helpers. Keep tests green. Don't add behavior.
"I'll write tests after" — Tests written after code pass immediately. Passing immediately proves nothing: might test wrong thing, might test implementation not behavior, might miss edge cases. Test-first forces you to see the test fail, proving it actually tests something.
"Already manually tested" — Manual testing is ad-hoc. No record, can't re-run, easy to forget cases. Automated tests are systematic.
"Deleting X hours is wasteful" — Sunk cost fallacy. The time is already gone. Keeping unverified code is technical debt.
"TDD is dogmatic" — TDD IS pragmatic: finds bugs before commit, prevents regressions, documents behavior, enables refactoring. "Pragmatic" shortcuts = debugging in production = slower.
Bug: Empty email accepted
RED:
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
Verify RED: FAIL — expected 'Email required', got undefined ✓
GREEN:
function submitForm(data: FormData) {
if (!data.email?.trim()) return { error: 'Email required' };
// ...
}
Verify GREEN: PASS ✓ → REFACTOR → extract validation if needed.
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
| Problem | Solution |
|---|---|
| Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
When adding mocks or test utilities, avoid these pitfalls:
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is debt. |
| "Keep as reference" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
| "It's about spirit not ritual" | Violating the letter IS violating the spirit. |
Code before test. Test after implementation. Test passes immediately. Can't explain why test failed. Rationalizing "just this once." "I already manually tested it." "Keep as reference." "This is different because..."
All mean: Delete code. Start over with TDD.
Use when: Any bug, test failure, or unexpected behavior — before proposing fixes.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
Use this ESPECIALLY when:
BEFORE attempting ANY fix:
Read error messages carefully — don't skip. Read stack traces completely. Note line numbers, file paths, error codes.
Reproduce consistently — exact steps, every time. Not reproducible → gather more data, don't guess.
Check recent changes — git diff, recent commits, new dependencies, config changes, environmental differences.
Gather evidence in multi-component systems — Log data at each component boundary. Run once to see WHERE it breaks. Then investigate that component.
Example: For CI → build → signing pipeline:
# Log at each layer boundary:
# Layer 1 (workflow): echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
# Layer 2 (build): env | grep IDENTITY
# Layer 3 (signing): security find-identity -v
# Reveals: which layer the propagation fails
Trace data flow — Where does bad value originate? Trace up the call stack to the source. Fix at source, not symptom.
Root cause tracing technique: Observe symptom → find immediate cause → ask "what called this?" → keep tracing up → find original trigger → fix at source.
Example: git init in wrong directory → cwd: '' → createWorktree('') → Session.create() → test accessed tempDir before beforeEach → root cause: top-level variable initialization.
Create failing test reproducing the bug (Section 7: TDD)
Implement single fix addressing root cause
Verify: test passes, no other tests broken
If 3+ fixes failed: STOP. Question the architecture. Discuss with your human partner.
Pattern indicating architectural problem:
This is NOT a failed hypothesis — this is a wrong architecture.
waitFor(() => condition()) instead of setTimeout(check, 500).| Excuse | Reality |
|---|---|
| "Issue is simple" | Simple issues have root causes too. |
| "Emergency, no time" | Systematic is FASTER than thrashing. |
| "Just try this first" | First fix sets the pattern. Do it right. |
| "I see the problem" | Seeing symptoms ≠ understanding root cause. |
| "Multiple fixes saves time" | Can't isolate what worked. Causes new bugs. |
| "One more fix attempt" (after 2+) | 3+ failures = architectural problem. Question pattern. |
| "Reference too long, I'll adapt" | Partial understanding guarantees bugs. Read completely. |
ALL of these mean: STOP. Return to Phase 1.
Use when: About to claim work is complete, fixed, or passing — before committing or creating PRs.
Core principle: Evidence before claims, always.
Violating the letter of this rule is violating the spirit of this rule.
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
If you haven't run the verification command in this message, you cannot claim it passes.
Before claiming ANY status:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
- If NO: State actual status with evidence
- If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
Skip any step = lying, not verifying
| Claim | Requires | NOT Sufficient |
|---|---|---|
| Tests pass | Test output: 0 failures | Previous run, "should pass" |
| Linter clean | Linter output: 0 errors | Partial check, extrapolation |
| Build succeeds | Build: exit 0 | Linter passing |
| Bug fixed | Test original symptom | Code changed, assumed fixed |
| Regression test works | Red-green cycle verified | Test passes once |
| Agent completed | VCS diff shows changes | Agent reports "success" |
| Requirements met | Line-by-line checklist | Tests passing |
Tests:
✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"
Regression tests (TDD Red-Green):
✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)
Agent delegation:
✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report
Run the command. Read the output. THEN claim the result.
| Excuse | Reality |
|---|---|
| "Should work now" | RUN the verification |
| "I'm confident" | Confidence ≠ evidence |
| "Just this once" | No exceptions |
| "Agent said success" | Verify independently |
| "Partial check is enough" | Partial proves nothing |
| "Different words so rule doesn't apply" | Spirit over letter |
Use when: Completing tasks, implementing major features, before merging, or when receiving review feedback.
Mandatory after: each task in subagent-driven development, major features, before merge.
How to request:
BASE_SHA=$(git rev-parse HEAD~1), HEAD_SHA=$(git rev-parse HEAD)Code Reviewer Template:
You are reviewing code changes for production readiness.
Review {WHAT_WAS_IMPLEMENTED} against {PLAN_OR_REQUIREMENTS}.
Git range: {BASE_SHA}..{HEAD_SHA}
Check: code quality, architecture, testing, requirements, production readiness.
Review Checklist:
- Code: separation of concerns, error handling, type safety, DRY, edge cases
- Architecture: design decisions, scalability, performance, security
- Testing: tests test logic not mocks, edge cases, integration tests, all passing
- Requirements: all met, matches spec, no scope creep, breaking changes documented
Output format:
### Strengths — what's well done (be specific, file:line)
### Issues
#### Critical (Must Fix) — bugs, security, data loss
#### Important (Should Fix) — architecture, missing features, test gaps
#### Minor (Nice to Have) — style, optimization
For each: file:line, what's wrong, why it matters, how to fix
### Assessment — Ready to merge? [Yes/No/With fixes]
Response pattern: READ → UNDERSTAND → VERIFY → EVALUATE → RESPOND → IMPLEMENT
Forbidden responses — NEVER say:
Instead: Restate the technical requirement, ask clarifying questions, push back if wrong, or just fix it.
Handling unclear feedback:
IF any item is unclear:
STOP — do not implement anything yet
ASK for clarification on ALL unclear items
WHY: Items may be related. Partial understanding = wrong implementation.
Implementation order for multi-item feedback:
YAGNI check: If reviewer suggests "implementing properly," grep codebase for actual usage. If unused → "This endpoint isn't called. Remove it (YAGNI)?"
When to push back:
When feedback IS correct:
✅ "Fixed. [Brief description]"
✅ [Just fix it in code]
❌ "You're absolutely right!" / "Thanks for catching that!"
Actions speak. Just fix it.
Use when: Starting feature work that needs isolation or before executing implementation plans.
Announce at start: "I'm using the using-git-worktrees skill to set up an isolated workspace."
ls -d .worktrees 2>/dev/null (preferred) or ls -d worktrees 2>/dev/null
.worktrees wins.grep -i "worktree.*director" AGENTS.md 2>/dev/null
.worktrees/ (project-local, hidden) or ~/.config/superpowers/worktrees/<project>/ (global)For project-local directories: verify directory is in .gitignore before creating worktree.
git check-ignore -q .worktrees 2>/dev/null
If NOT ignored:
.gitignoreWhy critical: Prevents accidentally committing worktree contents to repository.
project=$(basename "$(git rev-parse --show-toplevel)")git worktree add "$path" -b "$BRANCH_NAME"if [ -f package.json ]; then npm install; fi
if [ -f Cargo.toml ]; then cargo build; fi
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f pyproject.toml ]; then poetry install; fi
if [ -f go.mod ]; then go mod download; fi
Use when: Implementation is complete, all tests pass, ready to integrate.
Announce at start: "I'm using the finishing-a-development-branch skill to complete this work."
git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/nullImplementation complete. What would you like to do?
1. Merge back to <base-branch> locally
2. Push and create a Pull Request
3. Keep the branch as-is
4. Discard this work
| Option | Merge | Push | Keep Worktree | Cleanup Branch |
|---|---|---|---|---|
| 1. Merge locally | Yes | - | - | Yes |
| 2. Create PR | - | Yes | Yes | - |
| 3. Keep as-is | - | - | Yes | - |
| 4. Discard | - | - | - | Yes (force) |
Option 1 (Merge): Checkout base → pull latest → merge feature → verify tests on merged result → delete feature branch → cleanup worktree
Option 2 (Push/PR): Push branch → gh pr create with summary and test plan → cleanup worktree
Option 4 (Discard): Require typed "discard" confirmation. Show what will be deleted (branch, commits, worktree). Then: checkout base → git branch -D → cleanup worktree
Use when: Creating new skills, editing existing skills, or verifying skills work.
A reusable reference guide for proven techniques, patterns, or tools. NOT narratives about how you solved a problem once.
Skills are: Reusable techniques, patterns, tools, reference guides Skills are NOT: One-off solutions, project-specific conventions, mechanical constraints (automate those instead)
NO SKILL WITHOUT A FAILING TEST FIRST
Test with subagent pressure scenarios. Watch agents fail without skill (RED). Write minimal skill (GREEN). Close loopholes (REFACTOR).
---
name: skill-name-with-hyphens
description: "Use when [specific triggering conditions]"
---
Sections: Overview → When to Use → Core Pattern → Quick Reference → Common Mistakes
CRITICAL: Description = When to Use, NOT What the Skill Does
Agents read description to decide which skills to load. The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow.
Why: Testing revealed that when a description summarizes workflow, agents follow the description shortcut instead of reading the full skill. A description saying "code review between tasks" caused an agent to do ONE review, even though the skill specified TWO reviews (spec + quality).
# ❌ BAD: Summarizes workflow — agent may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review
# ✅ GOOD: Just triggering conditions
description: Use when executing implementation plans with independent tasks
condition-based-waiting not async-test-helperscreating-skills, testing-skillsroot-cause-tracing not debugging-techniquesSkills that enforce discipline need to resist rationalization:
RED: Create pressure scenarios → run WITHOUT skill → document baseline failures (exact rationalizations agents use) GREEN: Write skill addressing specific failures → run WITH skill → verify compliance REFACTOR: Find new rationalizations → add counters → re-test until bulletproof
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to agents. Test it. |
| "It's just a reference" | References can have gaps. Test retrieval. |
| "Testing is overkill" | Untested skills always have issues. 15 min saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "No time to test" | Deploying untested skill wastes more time fixing later. |