Install
openclaw skills install @h-mascot/design-eval-loopUse when an existing UI artifact needs to be iterated to a verified 5/5 design score. Enumerate every view, run independent dead-item and design-rubric audits, apply fixes, then re-score with fresh reviewers until every view and the whole app pass. Designed for Enterprise Crew/OpenClaw operators finishing prototypes without false 5/5 claims.
openclaw skills install @h-mascot/design-eval-loopDrive an existing UI artifact to a verified 5/5 on a current design rubric. The engine is a per-view loop: enumerate → audit + score (parallel sub-agents) → fix → re-score, repeated until every view passes, then once more for the whole app. It is a finishing pass — the artifact must already exist.
This skill exists to stop two failure modes: shipping dead controls (links and buttons that go nowhere) and claiming a false 5/5 without evidence.
This skill is safe to publish as a generic workflow. Do not include private product names, customer data, internal screenshots, private URLs, or local filesystem paths in public scorecards. For public release, run the publish-skill sanitizer before sending the bundle to GitHub, SuperAda, or ClawHub.
Not for: building the artifact from scratch (build it first), or a one-line visual tweak (just make the edit).
references/rubric.md. Honor any dimension the user
names explicitly (e.g. "2026 best practices", "WCAG AA").Plan it with TodoWrite first — one todo per phase, plus the iteration cap.
List every view and every interactive element. A view is any distinct surface: home, drawer, sheet, overlay, settings list, and each detail panel. Detail panels are views too — don't fold them into one line.
For each round, spawn two sub-agents at once (one message, two Agent
tool calls) so they don't collide on the file:
references/rubric.md,
0–5 per dimension, with a one-line reason per sub-score and the specific
blocker keeping each view under the bar.Use the prompt templates in references/sub-agent-prompts.md verbatim; they
force structured, file-grounded output instead of vibes.
Apply the highest-impact findings first: real bugs (undefined tokens, occluded panels, wrong z-index), then dead items, then the lowest-scoring rubric dimensions. Fix in the canonical file. After edits, run a static integrity check (balanced tags/braces/parens, every panel defined, every row mapped).
Re-run Agent B (a fresh agent — don't trust the fixer's self-report) on the changed views. Record the delta in a scorecard. Repeat Phases 2–4 per view until it hits the bar.
Once every view passes individually, score the whole app as one experience (cross-view consistency, shared theme, navigation coherence, no orphan surfaces). Iterate until the whole-app score hits the bar.
Emit a compact scorecard table (view → before → after) plus an
<od-card type="verify-scorecard"> covering the dead-item and rubric checks.
End with: which views pass, which don't, the named blocker for any that don't,
and the next round's target.
references/rubric.md — the scoring dimensions, weights, and 5/5 bar.references/sub-agent-prompts.md — paste-ready prompts for Agent A and Agent B.