Lucky Build Execution Protocol

v1.0.1

Systematic protocol for working through a project build queue (NEXT_TASKS.md). Use when a project has an ordered task list and you need to pick up, execute,...

⭐ 0· 101·0 current·0 all-time

by@rmbell09-lang

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for rmbell09-lang/lucky-build-protocol.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Lucky Build Execution Protocol" (rmbell09-lang/lucky-build-protocol) from ClawHub.
Skill page: https://clawhub.ai/rmbell09-lang/lucky-build-protocol
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install lucky-build-protocol

ClawHub CLI

Package manager switcher

npx clawhub@latest install lucky-build-protocol

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

Most steps (reading NEXT_TASKS.md, running tests, committing checkpoints, selecting models) align with a build-execution protocol. However, hard-coded steps like checking 'Jinx' via ssh -i ~/.ssh/lucky_to_mac luckyai@100.90.7.148 and reading memory/YYYY-MM-DD.md are not justified by the stated purpose and look out-of-band for a generic build protocol.

Instruction Scope

SKILL.md instructs the agent to read local session/memory files, run session_status and session_status(model=...), run test suites, make git commits, and contact an external host over SSH and HTTP. Reading agent memory files and an explicit private key path (~/.ssh/lucky_to_mac) is sensitive and goes beyond normal build assistance. The auto-trigger behavior (when NEXT_TASKS.md exists) means these instructions could run whenever the file is present.

✓

Install Mechanism

Instruction-only skill with no install spec or code to write to disk; this minimizes install-time risk. There is no installer that downloads or executes third-party code.

Credentials

Declared requirements list no credentials or config paths, but SKILL.md explicitly references a private SSH key file (~/.ssh/lucky_to_mac), a specific remote host (100.90.7.148), and agent memory files — sensitive resources not declared in the metadata. That mismatch is a red flag for possible credential access or exfiltration.

✓

Persistence & Privilege

always is false and the skill is user-invocable; it does not request permanent system presence or claim to modify other skills. The auto-trigger on NEXT_TASKS.md is reasonable for a build protocol but worth noting because it causes the instructions to run whenever the file is present and the user asks to continue.

What to consider before installing

This skill mostly reads like a disciplined build checklist, but it contains unexplained and sensitive actions: it tells the agent to read local agent memory files and to SSH with a specific private key (~/.ssh/lucky_to_mac) to a hard-coded IP (100.90.7.148). Before installing or enabling this skill, ask the author to: (1) remove or justify any step that requires access to ~/.ssh or other private keys and to declare those as required config paths if truly needed; (2) explain what 'Jinx' is, why a fixed IP and key are used, and whether network traffic will leave your environment; (3) declare any access to agent memory or other internal files and limit it to the minimum required; (4) confirm whether the skill will auto-trigger and whether it will make git commits/modify project files — and provide an opt-out or dry-run mode. If you can't get clear, constrained answers, run the skill only in an isolated sandbox and audit file and network activity while it runs.

Like a lobster shell, security has layers — review code before you run it.

autonomous-agentvk97bwetnqp52v7kgcjge0hrftn83mmehbuildvk97bwetnqp52v7kgcjge0hrftn83mmehlatestvk97bwetnqp52v7kgcjge0hrftn83mmehopenclawvk97bwetnqp52v7kgcjge0hrftn83mmehproject-managementvk97bwetnqp52v7kgcjge0hrftn83mmehworkflowvk97bwetnqp52v7kgcjge0hrftn83mmeh

101downloads

0stars

2versions

Updated 1mo ago

v1.0.1

MIT-0

Build Execution Protocol

Repeatable, enforced process for executing a build queue across sessions. Same discipline as the heartbeat — mandatory checklists, verification gates, no shortcuts.

Auto-Trigger

If a project has NEXT_TASKS.md, follow this protocol automatically. Ray only needs to say "work on [project]" or "keep going."

Phase 0: Session Setup (MANDATORY — COMPLETE IN ORDER, NO EXCEPTIONS)

□ 0. Context Check (HARD STOP)

Run session_status FIRST
If context > 70%: Wrap up, update files, request fresh session. STOP.
If context > 85%: CRITICAL. Save state immediately, do NOT start new work.
Log context % for reference

□ 1. Read Project State (BEFORE model selection — you need to see the work first)

Read NEXT_TASKS.md — find the first unchecked [ ] item
Read STATUS.md (if exists) — understand current build state
Read today's + yesterday's memory/YYYY-MM-DD.md — know what happened recently
Skim the last session's work log for any warnings or half-finished items

□ 2. Model Selection (NOW that you've seen the tasks)

Review the upcoming tasks and select model based on what they need:
- Haiku: Simple doc updates, formatting, config changes, trivial fixes
- Sonnet: Standard coding, implementation, routine bug fixes, test writing
- Opus: Architecture decisions, complex algorithms, hard debugging, security-critical work
If the session has a mix: Start with Sonnet, switch to Opus for complex tasks, back to Sonnet after
Switch model now if needed: session_status(model="...")

□ 3. Verify Previous Session's Work (DO THIS BEFORE NEW WORK)

Look at what the last session claimed to complete (from memory files / NEXT_TASKS.md checked items)
Actually verify it: Run the code, check the output, read the files
If anything was marked done but doesn't work → uncheck it, fix it first
This exists because sessions lie to themselves. Trust but verify.

□ 4. Baseline Test Run (NON-NEGOTIABLE)

Run the full fast test suite BEFORE touching any code:
```
python3 -m pytest -x -q --ignore=tests/integration
```
Record the count: N passed, 0 failed
If tests are already failing: Fix them FIRST using test-suite-repair skill
You must end the session with ≥ this test count

□ 5. Git Checkpoint

If the project uses git: git add -A && git commit -m "checkpoint: session start — N tests passing"
If no git: copy critical files to a backup location
Why: If task 3 breaks something silently, you can diff back to this known-good state
Create a new checkpoint after each completed task

□ 6. Check Jinx Status

ssh -i ~/.ssh/lucky_to_mac luckyai@100.90.7.148 "curl -s http://localhost:3001/status"
If Jinx has completed results → review and verify them first (apply same verification standard)
If Jinx is idle → good, we'll queue tasks as we go

□ 7. Plan the Session

Look at remaining unchecked tasks in NEXT_TASKS.md
Pick 2-4 tasks you can realistically complete this session
WIP limit: 1 task active at a time. Finish or park before starting another.
Size check: If a task will take >45 min of coding, consider:
- Breaking it into subtasks
- Spawning a coding agent via coding-agent skill for the implementation
- Queuing parts to Jinx

Phase 1: Task Execution Loop

For EACH task, follow this loop without skipping steps:

Step 1: Understand (2-5 minutes)

□ Read the task description in NEXT_TASKS.md □ Read the relevant source files (use the quick reference table at bottom of NEXT_TASKS.md) □ Read the spec section — find the actual requirement in the spec and quote it:

SPEC REQUIREMENT: "The Judge worker evaluates 5-10 variants and selects 
the best with a decision trace including confidence score." 
— master_spec_full.txt, Section 4.2

Why: This is how we avoid building the wrong thing. If you can't find a spec reference, ask Ray before implementing. □ Define "done" — one sentence: "This task is done when ___" □ Jinx check: "Can Jinx do this?" If it doesn't need live coding/testing → queue for Jinx, skip to next task

Step 2: Implement

□ Jinx split check: Before coding, break the task into parts. Can any part run on Jinx?

Lucky does: Live coding, file system changes, testing, anything needing imports/execution
Jinx does: Writing docs from existing code, analyzing patterns, generating test data, reviewing logic, drafting boilerplate
Split pattern: Lucky writes the core logic → queues Jinx to write tests/docs/edge-case analysis for it
Queue Jinx parts NOW so they run in parallel while you code your parts □ Write the code □ Stay focused — one task only. Don't drift into adjacent work. □ If you discover something broken, note it in NEXT_TASKS.md as a new item but don't fix it now unless it blocks you □ Check skills — if this task type matches a skill (judge-worker-pattern, pipeline-debugging, ui-truth-mapping, etc.), read and follow that skill first

Step 3: Test (MANDATORY)

□ Write tests for the new code (if the task warrants tests) □ Run the full fast test suite:

python3 -m pytest -x -q --ignore=tests/integration

□ Must pass with 0 failures before proceeding □ Test count must be ≥ baseline — you cannot lose tests

Step 4: Verify (NOT THE SAME AS "TESTS PASS" — USE OPUS)

This is the step that gets skipped. Don't skip it.

□ Switch to Opus model for verification: session_status(model="opus")

Opus catches things Sonnet misses — logic gaps, spec mismatches, subtle bugs
This is the quality gate. Use the strongest model here. □ Read back the code you wrote — does it do what the task asked? Not "does it compile" — does it actually implement the requirement? □ Check the output — call the function/endpoint/CLI and look at what it returns. Print it. Read it.

# Actually run it and look at the output
result = the_thing_you_built(test_input)
print(json.dumps(result, indent=2))

□ Compare to spec — re-read the spec requirement you quoted in Step 1. Does the output match? □ Edge cases — empty input, null, unexpected types, boundary values □ If UI copy: Read the copy. Does it make sense to a human who isn't a developer? □ Switch back to Sonnet after verification (don't burn Opus tokens on the next task's implementation)

Verification failures:

If gaps found → fix them before marking done (stay on Opus for the fix if it's complex)
If can't fix quickly → mark task [partial] with notes on what's missing, add fix as new task

Step 5: Checkpoint + Mark Done (ENFORCEMENT GATE — CANNOT SKIP)

□ Git checkpoint: git add -A && git commit -m "done: [task description] — N tests passing" □ Queue Jinx follow-up BEFORE marking done: After EVERY completed task, ask "What can Jinx do now?"

Tests? Docs? Code review? Edge case analysis? Performance audit?
Jinx should NEVER be idle when follow-up work exists
Actually submit the Jinx task via curl NOW — not "later" □ Update NEXT_TASKS.md: Check the box [ ] → [x], add notes if needed □ RUN THE ENFORCEMENT GATE (MANDATORY — NO EXCEPTIONS):

~/.openclaw/workspace/scripts/enforce.sh "<project_dir>" "<task_name>" "<memory_file>"

If it exits non-zero: you are NOT DONE. Fix every failure before proceeding.
Do NOT mark the task done, do NOT move to the next task, do NOT pass Go.
This gate checks: memory log exists with spec quote + verification + Jinx follow-up, tests pass, git clean, NEXT_TASKS.md updated
If the task revealed new work → add it to the appropriate priority section

Step 6: Log (Structured Format)

Update memory/YYYY-MM-DD.md using this format:

### Task: [name from NEXT_TASKS.md]
- **Status:** ✅ Done / ⚠️ Partial / ❌ Blocked
- **What was built:** [1-2 sentences]
- **Files changed:** [list]
- **Tests:** [before] → [after] (added N)
- **Verified:** [how — what you checked beyond tests]
- **Spec match:** [yes/no — quote spec if relevant]
- **Jinx queued:** [what follow-up work]
- **Surprises:** [anything unexpected, lessons learned]

Update STATUS.md if this was a significant milestone. Log to Mission Control if available.

Step 7: Skill Extraction Check

□ Did this task produce a reusable pattern?

New algorithm that could apply to other projects?
Debugging technique worth capturing?
Architecture pattern worth documenting? □ If yes → note it in the Learning Capture section of today's memory □ If it's substantial → create a skill (check existing skills first to avoid duplicates)

Step 8: Context Check + Continue

□ Run session_status — check context usage □ If context > 60%: 1-2 more tasks max. Be selective. □ If context > 70%: Finish current task, go to Phase 3 (wrap-up). STOP new work. □ If context OK: Move to next unchecked task, repeat from Step 1.

Phase 2: Blocker Handling

When you hit a blocker:

□ Classify:

Needs Ray: Decision, approval, info only Ray has
Technical: Bug, missing dependency, unclear spec
Dependency: Task X depends on unfinished Task Y

□ Handle by type:

Needs Ray: Mark [blocked: waiting on Ray — reason] in NEXT_TASKS.md, message Ray, skip to next task
Technical: Attempt to solve (switch to Opus if complex). If stuck >15 min, mark blocked, move on.
Dependency: Skip to the dependency task first, or skip both and note the chain

□ Never let one blocker stop all progress.

□ If ALL tasks blocked:

Log blockers clearly
Check for supporting work (tests, docs, refactoring, skills)
Message Ray with the full blocker list and what you did instead

Phase 3: Session Wrap-Up (MANDATORY)

□ NEXT_TASKS.md current:

All completed items checked [x]
New discovered items added
Blockers noted with reasons
"Next session starts here" marker on the first unchecked task

□ Final test run:

python3 -m pytest -x -q --ignore=tests/integration

0 failures confirmed
Test count ≥ starting baseline

□ Final git checkpoint:

git add -A && git commit -m "session end: [N] tasks done, [M] tests passing"

□ Structured handoff note in memory/YYYY-MM-DD.md:

## Session Wrap-Up [time]
- **Tasks completed:** N (list them)
- **Tests:** start count → end count
- **Next task:** [first unchecked item]
- **Blockers:** [list or "none"]  
- **Warnings for next session:** [anything tricky, gotchas, or "none"]
- **Jinx tasks queued:** [list or "none"]
- **Context at wrap:** X%

□ Update STATUS.md with current test count and completion %

□ Queue Jinx overnight work if applicable

□ DO NOT leave work half-done. Finish or revert.

Progress Reporting to Ray

When to message Ray:

After completing 3+ tasks in a session — batch summary, not per-task spam
When hitting a blocker that needs Ray — immediately, with specific question
When a milestone completes (e.g., entire Priority section done) — celebration-worthy
When something unexpected happens — broken assumptions, spec contradictions, major discoveries
At session wrap if significant progress was made

Don't message Ray for: routine task completion, passing tests, minor progress. Save it for batches.

Quality Gates (ENFORCEMENT)

A task is NOT done unless ALL of these are true:

Code written and saved
Tests written (if applicable)
Full fast suite passes with 0 failures
Output verified with Opus model (not just "tests pass")
Spec requirement quoted and matched
Git checkpoint committed
NEXT_TASKS.md checkbox marked
Jinx follow-up queued (if applicable)
No regressions (test count ≥ baseline)

Red Lines (Never Cross):

Never mark a task done without verification
Never decrease test count
Never start a new task with failing tests
Never continue past 70% context without wrapping up
Never leave NEXT_TASKS.md out of date
Never implement without reading the spec requirement first
Never skip verifying the previous session's work

Learning Capture

After completing any task, write one line:

What worked — technique, pattern, or approach worth reusing
What didn't — mistake, false start, or wasted time
Skill candidate? — if this pattern is reusable, note it for potential skill extraction

If it's worth keeping long-term → update MEMORY.md at end of day.

Comments

Loading comments...