Install
openclaw skills install qa-pilotAutomatically tests and verifies all features and workflows of a built app against the original spec, fixes bugs, and confirms the project is truly complete...
openclaw skills install qa-pilotThe problem this solves: Users (especially vibe coders) ask an agent to build something. The agent builds it, says "done," and the user discovers bugs, missing features, broken flows. Then comes the exhausting back-and-forth loop of reporting issues, waiting for fixes, testing again... This skill eliminates that loop by making the agent test its own work before declaring it done.
The core idea: Before telling the user "I'm finished," the agent acts as its own QA tester. It opens the app, clicks through every page, tries every feature, fills every form, and compares what it finds against the original plan. It fixes what's broken, adds what's missing, and only reports completion when the app actually works.
This skill should be triggered automatically whenever:
The agent should NOT skip testing. Testing is part of building. A carpenter doesn't hand you a table with loose legs and say "let me know if it wobbles."
Before testing anything, the agent must know what the finished product should look like.
spec.md, PLAN.md, PRD.md, design-os output)Based on the gathered info, create a mental (or written) checklist:
PROJECT: Photo Editor App
URL: http://localhost:3000
FEATURES TO TEST:
□ Home page loads with app branding
□ Image upload from device (gallery/file picker)
□ Image upload via drag & drop
□ Basic edits: crop, rotate, flip
□ Filters: at least 5 preset filters
□ Text overlay tool
□ Export/save edited image
□ Undo/redo functionality
□ Mobile responsive layout
□ Dark mode toggle
WORKFLOWS TO VERIFY:
□ Upload → Edit → Save (happy path)
□ Upload → Apply filter → Adjust → Save
□ Upload → Add text → Change font → Save
□ Try to save without uploading (should show error)
EDGE CASES:
□ Very large image (>10MB)
□ Non-image file upload (should reject)
□ Navigate away with unsaved changes
Important: If the agent can't find a spec or clear feature list, it should infer the expected features from the original conversation and common patterns for that type of application. Don't ask the user to provide a test plan — that defeats the purpose.
Before testing features, verify the app is running and accessible.
Check if the dev server is running
Open the app in the browser
http://localhost:PORT)Check the console for errors
→ Stop. Fix the startup issue first. No point testing features if the app is down.
Now test every page and every feature, methodically.
Think like a first-time user who is also a QA engineer:
1. Navigate to the page (click link or go to URL)
2. SNAPSHOT → Does it look right? Any obvious visual issues?
3. Read all text → Any placeholder text? Lorem ipsum? Missing content?
4. Find all interactive elements (buttons, forms, links, toggles)
5. Click each button → Does it do something? Any errors?
6. Fill each form → Submit with valid data → Does it work?
7. Submit forms with INVALID data → Does it validate? Show errors?
8. Check all links → Do they go somewhere? 404s?
9. Resize viewport → Does it work on mobile sizes?
10. Check console → Any errors appeared during interaction?
A workflow is a sequence of actions that achieves a goal. Test the complete journey:
Example: "Create and save an edited photo"
1. Open the app
2. Click "Upload" or find the upload area
3. Upload a test image → Does it appear on canvas?
4. Click "Crop" tool → Does crop UI appear?
5. Adjust crop area → Does preview update?
6. Apply crop → Does image update?
7. Click "Save" or "Export" → Does download start?
8. Verify the saved file exists and is valid
For each step, ask:
The #1 mistake agents make is only checking if pages load. Real testing means:
This is where the magic happens. Compare what exists against what was promised.
| Spec Says | Reality Check | Verdict |
|---|---|---|
| "Image upload from gallery" | Upload button exists and works | ✅ Done |
| "5 preset filters" | Only 3 filters visible | ❌ Incomplete |
| "Dark mode toggle" | No toggle found anywhere | ❌ Missing |
| "Responsive on mobile" | Layout breaks below 768px | ❌ Broken |
| "Undo/redo" | Buttons exist but undo doesn't work | ❌ Buggy |
This is the core innovation. The agent doesn't just report issues — it fixes them.
┌──────────────────────────────────────┐
│ TEST EVERYTHING │
│ (Phase 1 + 2 + 3) │
└──────────────┬───────────────────────┘
│
▼
┌──────────────┐
│ Issues found? │
└──┬───────┬────┘
│ │
No │ │ Yes
│ │
▼ ▼
┌────────┐ ┌──────────────────┐
│ DONE │ │ FIX ISSUES │
│ Report │ │ (prioritized) │
│ to user│ └────────┬─────────┘
└────────┘ │
▼
┌─────────────┐
│ RE-TEST │
│ (only fixes)│
└──────┬──────┘
│
▼
┌──────────────┐
│ All fixed? │
└──┬───────┬───┘
│ │
No │ │ Yes
│ │
└───┐ │
│ ▼
┌───────────┘ ┌────────┐
│ back to fix │ DONE │
└──────────────┘────────┘
After all testing and fixing, give the user a clear, honest report.
## 🧪 QA Report — [Project Name]
**Tested:** [date/time]
**URL:** [app URL]
**Test Duration:** [how long testing took]
**Fix Cycles:** [number of fix-test loops]
### ✅ Working (X/Y features)
- [Feature 1] — fully working
- [Feature 2] — fully working
- ...
### ⚠️ Working with Caveats
- [Feature] — works but [caveat]
e.g., "Image upload works but files >5MB may be slow"
### ❌ Issues Remaining
- [Feature] — [what's wrong] — [why it couldn't be fixed]
e.g., "Export to PDF — library compatibility issue with the framework version"
### 🔲 Not Tested (explain why)
- [Feature] — [reason]
e.g., "Payment integration — requires live API key"
### 📊 Score: [X/Y features fully working] ([percentage]%)
Web App (SPA):
Server-Rendered App:
Mobile-First App:
API/Backend:
❌ Don't just check if the server is running — That's not testing ❌ Don't skip features you think are "minor" — Test everything ❌ Don't assume "it worked before" — Re-test after every change ❌ Don't report "done" while issues are still present — Fix first, report after ❌ Don't test only the happy path — Invalid inputs, edge cases, and errors matter ❌ Don't ignore console errors — They're warnings about real problems ❌ Don't fix things without re-testing — Fixes can break other things ❌ Don't skip mobile testing — Most users are on mobile
The skill works out of the box, but can be customized per project:
# .qa-pilot.yaml (optional, place in project root)
# Skip certain tests (e.g., payment flows that need live keys)
skip:
- "Payment integration"
- "Email sending"
# Custom test data
test_data:
test_image: "./test-assets/sample-photo.jpg"
test_user:
email: "test@example.com"
password: "TestPass123!"
# Maximum fix cycles before reporting
max_fix_cycles: 5
# Minimum score to auto-report success
pass_threshold: 90 # percent
# Always test these viewports
viewports:
- desktop: [1920, 1080]
- tablet: [768, 1024]
- mobile: [375, 812]
This skill is designed to be:
The best bug is the one the user never sees because the agent caught it first.