spec-executor

Workflows

Execution companion for spec-workflow: state navigation, task tracking via tasks.md, incremental delivery, and session recovery. Use after spec-workflow produces the plan.

Install

openclaw skills install spec-executor

Spec Executor

Execution workflow centered on state navigation and state assurance. At any moment, know where you are, what comes next, and how to recover if interrupted.


⚠️ Dependency Notice

This skill CANNOT be used standalone. It depends on spec-workflow to produce the plan and tasks.md tracker before execution begins.

SkillPhaseResponsibility
spec-workflowPlanningRequirements → Design → Task breakdown
spec-executorExecutionState navigation → Task execution → State recovery

Handoff example:

spec-workflow output:
  docs/login-feature/tasks.md
  docs/login-feature/requirements.md
  docs/login-feature/design.md

spec-executor input:
  Read docs/login-feature/tasks.md
  Execute tasks in order
  Update tasks.md after each task

Do NOT use this skill for work that has not been planned by spec-workflow.

Validation: Before executing, confirm tasks.md exists and contains valid task entries (Scope + Verification fields). If the file is missing or malformed, stop and ask the user.


Quick Start

First time using this skill with a confirmed tasks.md:

1. Read tasks.md → find first [ ] task
2. Mark it [~] → execute per Scope description
3. When done: mark [✓], fill Verification, stage changes
4. Show diff → wait for user review and explicit commit approval
5. User says "next" → repeat from step 1

Session compressed? Jump to Session Recovery.


State Navigation

Where Am I?

Received user instruction
│
├─ Is the scope clear and small? ──→ YES → Execute directly (Simple Task)
│                                    NO
├─ Does it need design/architecture? → YES → Plan first → Write tasks.md → Execute
│                                    NO (unclear)
└─ Investigate → Reclassify
Current StateWhat To Do NextKey Output
Just received instructionClassify task (simple / complex / exploratory)Decision: plan or execute directly
Planning in progressWrite design doc + tasks.md → wait for user confirmationConfirmed tasks.md
Ready to implementRead tasks.md, mark first pending task as [~]Task in progress
Just finished a taskUpdate tasks.md to [✓] → verify → commit → next taskUpdated tracker
Hit an errorStop → diagnose → fix → re-verify → resumeDocumented fix
Session interruptedRead tasks.md → verify last done task → resume first pendingRecovered context
User inserts unrelated queryHandle query → return to previous state without losing positionContinuity preserved

State Transition Rules

[Classified] --(simple)--> [Executing] --(task done)--> [Update tracker] --(more tasks)--> [Executing]
      │                              │                                    └─(all done)--> [Complete]
      └─(complex)--> [Planning] --(confirmed)--> [Executing]
                             └─(rejected)--> [Revise plan]

[Executing] --(error)--> [Diagnosing] --(fixed)--> [Update tracker] --> [Executing]
                                    └─(stuck)--> [Ask user]

[Any state] --(unrelated user query)--> [Handle query] --> [Return to prior state]

Critical rule: You cannot transition from [Executing] to the next task without updating the tracker. Updating tasks.md is the definition of task completion.


Task Classification

Before writing code, classify the task:

TypeCriteriaPlanning Required
ComplexNew feature, architecture change, multi-file refactor, API designYes — use spec-workflow to produce design doc + tasks.md
SimpleSingle-file change, config tweak, clear bug fix with known scopeNo — brief explanation, then implement
ExploratoryUnclear scope, needs investigation to understand the problemInvestigate first, then reclassify

Boundary rule: If a "simple" task grows to need design decisions, interface changes, or cross-file coordination, stop immediately and upgrade to "complex" with planning.

Task Type Reference

User IntentTypeAction
New feature, architecture change, large refactorComplexTrigger spec-workflow for full planning
Bug fix with ticket/reference, interface alignmentFix / ComplexTrigger spec-workflow, Phase 1-2 may be simplified
Single-file change, config tweak, clear small fixSimpleBrief explanation, then implement directly
Explain code, check logs, updatecode, info queryRoutineExecute directly, no Spec

Session split rule: If tasks.md has > 8 pending tasks or estimated work > 2 hours, suggest /clean and continue in a new session.

Must Ask the User

Stop and ask before proceeding when:

SituationQuestion to Ask
Unclear requirement background"What problem should this code solve?"
Design conflict (A vs B)"Both approaches have tradeoffs — what's your priority?"
Deleting code"This code appears used in X — are you sure you want to delete it?"
Deleting code — verificationRun grep -r "ClassName" --include="*.java" --include="*.xml" to confirm no downstream references before deleting
Scope expands beyond original"This seems broader than the original request — should we expand the design?"
Better approach found"I see a better refactoring path — can you explain the original design intent?"
Uncommitted user changes exist"I see you have uncommitted changes — should I work from the latest code?"

State Assurance

The Tracker (tasks.md)

tasks.md is the primary state reference. If memory, conversation, and tracker disagree, pause and reconcile with the user — never silently override user intent.

Location: docs/{feature-name}/tasks.md (or project-defined location).

Format:

- [ ] Task name (≤10 words)
  - Scope: files and specific methods/fields changed
  - Affected: files that need sync changes due to this task
  - Verification: how to verify (compile, test, grep, etc.)
  - Commit: the actual commit message used
  - User corrections: 0 | Compression recoveries: 0
  - Notes: dependencies, blockers, scope changes

Critical field — Scope: Must be precise down to method/field level. After session compression, recovery depends entirely on this field to reconstruct what was done and what remains.

Example:

# Implementation Plan

- [✓] 1. Add user authentication endpoint
  - Scope: `AuthController.java` add `login()` method; `AuthService.java` add `authenticate()`
  - Affected: `UserRepository.java` (add `findByUsername`)
  - Verification: `./gradlew :api:compileJava` passes
  - Commit: `feat(auth): add login endpoint`
  - User corrections: 0 | Compression recoveries: 0
  - Notes: depends on task 0 (DB schema)

- [~] 2. Add JWT token generation
  - Scope: `JwtUtil.java` add `generateToken()`; `AuthService.java` inject JwtUtil
  - Affected: `application.yml` (add jwt.secret)
  - Verification: —
  - Commit: —
  - User corrections: 0 | Compression recoveries: 0
  - Notes: —

- [ ] 3. Add auth middleware
  - Scope: ...

Status values:

  • [ ] — Not started
  • [~] — In progress (set this before editing code)
  • [✓] — Done (set this before committing or starting next task)
  • [⏭] — Skipped

Correction tracking:

  • User corrections ≥ 2: Escalate — record the pattern in Notes and adjust approach
  • Compression recoveries ≥ 6: Suggest user /clean and start fresh

Real-Time Update Discipline

EventActionWhy
Start a taskChange [ ][~]Prevents duplicate work
Finish a taskChange [~][✓], fill Verification + CommitDefines completion
User changes scopeAdd note, increment User correctionsPrevents drift
Session compressesMark [~][ ] if uncertainPrevents false progress

This takes precedence over compilation, testing, and starting the next task. A task whose tracker entry is not updated is considered NOT done.

Timeliness Guarantee

tasks.md must be updated in real time — never retroactively:

  • After completing a task → update status, verification, and commit message immediately
  • After starting a task → change [ ] to [~] before editing code
  • After session compression → mark [~][ ] if uncertain about state
  • After user adds/changes scope → record in Notes immediately
  • Never reconstruct progress after compression without re-verifying code state first

Compression Recovery

tasks.md is the operational basis for recovery:

StepHow to use tasks.md
Read and confirm progressScan statuses in order; identify boundary between [✓] and [ ]
Verify code stateOnly verify the last [✓] row (state boundary). Re-run its verification method. If it fails, state is stale — pause and reconcile
Determine next stepFind first [ ] or [~] row; execute from its Scope description

Session Recovery (After Interruption)

When context is compressed and the session continues:

  1. Report count — state "This session has recovered from compression N times"
  2. Read tasks.md — scan statuses, identify the boundary between [✓] and [ ]
  3. Validate tracker — verify tasks.md structure is intact (expected fields, no corruption). If malformed, ask user before proceeding.
  4. Verify boundary — re-run the verification for the last [✓] task. If it fails, the state is stale; pause and reconcile.
  5. Resume — mark the first [ ] task as [~] and continue from its Scope description.
  6. Count recoveries — increment Compression recoveries. If ≥ 6, stop execution, update tasks.md to [~], and wait for user /clean

Context Switching

ScenarioRule
User asks unrelated questionHandle it, do NOT modify tracker, return to prior task afterward
User starts a new featurePause current task (mark [~]), create new tracker for new work
User changes direction mid-taskTreat as error response: stop, document in Notes, wait for confirmation
User corrects design assumptionNot a context switch — follow Error Response Protocol. Update tasks.md Notes and design.md if assumptions changed

Key principle: One tasks.md per feature. Different features must have different tracker files.


Execution Rules

Incremental Delivery

  • Implement one small task at a time
  • Max ~300 lines changed per task
  • Stop for review after each task unless user explicitly authorizes continuous mode
  • Update tasks.md status immediately after each task — before committing or starting the next

Task Size Exemptions

ScenarioRelaxed LimitRequired Condition
User authorizes continuous mode≤ 500 linesUser explicitly says "continuous implement" or "finish the rest"
Pure deletionNo limitOnly removing code, no new logic, verified no downstream references
Batch rename/migration≤ 500 linesMechanical replacement (e.g., package rename), zero logic change
Config/constant extraction≤ 400 linesExtracting scattered magic values to unified constants, no behavior change

If exceeding limit: Must split into smaller tasks or explain split plan to user and get approval.

Continuous Mode Authorization

Only proceed without stopping for review when user explicitly says:

  • "You can continuous implement"
  • "Finish the rest without asking"
  • "Batch execute the remaining tasks"

NOT authorization: "OK", "Continue", "Go ahead", "Sure" — these confirm the current task only. Must ask: "Do you mean I can continuously implement remaining tasks, or just confirming this one?"

When in continuous mode:

  • Each task must still be independently verified (compile/type-check)
  • Each task must still be independently committed
  • tasks.md must be updated before starting the next task
  • User can interrupt at any time — stop immediately when they do

Risk Prediction (Before Starting Each Task)

Spend 30 seconds on pre-flight risk analysis:

  1. Match task type — What category is this? (API alignment / dependency injection / method refactor / DTO addition / deletion)
  2. Extract lessons — What has gone wrong with this type of task before?
  3. Pre-verify — Run the recommended verification before writing code:
    • API alignment: Extract signatures with language-specific tools, diff against interface
    • Deletion: grep -r "targetName" to confirm no downstream references
    • Method refactor: grep -r "oldMethodName" to find all call sites
    • Dependency change: Verify all injection points and mock references
    • DTO addition: Check serialization paths and downstream consumers
    • Mock update: Verify test files reference the correct type (interface vs concrete)
    • Package rename: grep -r "old.package.name" to find all imports

Review Rules

Default: Stop and wait for user review after every task.

Review exemptions (no need to stop, but still run quality gates):

ScenarioCondition
User authorized continuous modeExplicit continuous implement statement
Pure formattingOnly whitespace, line breaks, import sorting — zero logic change
Pure test data adjustmentOnly test inputs or expected values — no production code touched
Emergency hotfixUser explicitly says "urgent fix" and change is ≤ 10 lines

Code Quality Gates

Before marking any task [✓]:

  • Compiles — no build/type errors. Prefer quiet single-module compilation.
    • Build tool available: run ./gradlew :module:compileJava --quiet or equivalent
    • Build tool unavailable: use javap -public to extract signatures, or grep/diff to verify definitions align
  • No dead code — unused imports, helpers, or fields removed
  • No magic values — hardcoded literals (appearing 2+ times) extracted to named constants
  • Size limits — methods ≤ 30 lines, nesting ≤ 2 levels, files ≤ 500 lines
  • Naming — booleans use is/has/should prefixes, names are self-explanatory
  • Task status updatedtasks.md reflects current state with verification and commit message

Pre-Commit Checklist

Before every commit:

  • Change scope matches the task (or fits within an exemption)
  • All quality gates passed
  • No secrets or credentials in diff
  • Diff reviewed — user has seen the changes and approved
  • Commit message provided and follows project convention
  • tasks.md updated with status, commit message, and verification
  • Self-check: Did I follow the risk prediction? Did I avoid patterns known to fail?

Do not commit if self-check fails.


Error Response Protocol

When compilation fails, tests fail, tools error, or user points out a mistake:

  1. Stop — no further edits until resolved
  2. Locate — identify affected files and methods from error output
  3. Verify root cause — use tools (grep, diff, compiler) to confirm. Never rely on reading alone.
  4. Fix — apply minimal corrective change
  5. Document — distinguish the nature of the fix:
    • Code error: Record cause and fix in tasks.md Notes
    • User corrected design assumption: Update tasks.md Notes and sync changes to design.md
  6. Re-verify — run full quality gates and pre-commit checklist
  7. Resume — only after explicit user confirmation

Tool Verification Rules

  • Agent is for exploration (reading code, gathering context) — never for precision verification
  • Bash is for precision (grep, diff, compiler output, line counts) — always cross-check agent conclusions
  • Never judge signature alignment, overload resolution, or parameter types by reading alone
  • Never trust "looks correct" from an agent without independent verification

Core Principles

  • State over speed — document progress before moving on
  • Verify with tools — grep, diff, compiler; never eyeball
  • Stop and ask — when uncertain, pause for clarification
  • Tracker is truth — if memory and tracker disagree, the tracker wins
  • Agent is not evidence — agent conclusions require Bash cross-check
  • Plan before complex work — multi-file or architectural changes need design first