Install
openclaw skills install youtube-thumbnail-coachAudit, design, and A/B test YouTube thumbnails for click-through rate. Critiques visual hierarchy, contrast, emotion, scale, curiosity gap, and mobile readab...
openclaw skills install youtube-thumbnail-coachAudit existing YouTube thumbnails, design new ones, and run A/B tests to maximize click-through rate (CTR) without sacrificing watch time. Acts as an expert thumbnail designer who knows niche conventions, mobile-first constraints, and the curiosity-gap mechanics that drive clicks.
Invoke this skill when you have a thumbnail (existing or planned) and need it to perform better, or when you want to design one from scratch.
Basic invocation:
Audit this thumbnail: [image or description] Design a thumbnail for a tutorial video on Postgres indexing My CTR is 3% on a tech channel — what's wrong with my thumbnails?
With context:
Here are my last 10 thumbnails and CTR data — find the pattern I have 3 thumbnail variants for an A/B test, which should I publish first? My title is "I Lost $40k on This Trade" — design a matching thumbnail
The agent reviews the thumbnail, the niche, the title, and the channel context to produce specific, actionable redesign recommendations.
Before judging a thumbnail as "underperforming", the agent calibrates against niche norms. CTR is highly genre-dependent.
| Niche | Typical CTR Range | Notes |
|---|---|---|
| Gaming (let's plays, walkthroughs) | 8-12% | Bright palettes, exaggerated faces, game logos work hard |
| Tech (tutorials, reviews) | 4-7% | Lower because audience is searchy, not browsing |
| Finance / Trading | 3-6% | Skeptical audience; clickbait tanks watch time fast |
| Vlogs / Lifestyle | 5-9% | Personality-driven, face-forward |
| Kids / Family | 12-18% | Maximum saturation, characters, big reactions |
| Education / Documentary | 4-8% | Curiosity gap is the entire game |
| Music | 2-5% | Browse traffic mostly bypasses thumbnail |
If the channel sits below the floor of its niche range, the thumbnail (or title) is the most likely culprit. If it sits comfortably inside the range, optimization is incremental, not corrective.
The agent grades the thumbnail on six axes. Each is binary-ish: pass, marginal, or fail.
Visual hierarchy - Does the eye know where to land first, second, third?
Contrast - Does the subject pop from the background at 90px height?
Emotion - Is there a clear, exaggerated feeling (≠ neutral, ≠ mild smile)?
Scale - Is the focal subject ≥40% of the frame?
Curiosity gap - Does it ask a question the title doesn't answer?
Mobile readability - Does it survive when shrunk to a phone preview?
Example audit output:
Visual hierarchy: FAIL — three competing focal points (face, logo, background text)
Contrast: MARGINAL — face blends into mid-tone background
Emotion: FAIL — neutral expression, no story telegraphed
Scale: PASS — subject is ~50% of frame
Curiosity gap: MARGINAL — title and thumbnail say the same thing
Mobile readability: FAIL — text drops below 90px and becomes illegible
Diagnosis: thumbnail competes with itself. Rebuild with single focal point,
exaggerated expression, and text that contradicts or extends the title.
The agent doesn't design from scratch every time. Each niche has formulas with proven CTR. Pick one, then customize.
| Niche | Formula | Example |
|---|---|---|
| Gaming let's-play | Face + game text/logo + key object | Face reaction (left) + "FINAL BOSS" (top right) + boss silhouette |
| Tutorial / How-to | End result + face + arrow/circle | Finished UI screenshot + small face corner + red arrow at the magic part |
| Transformation | Before/after split | Left half "before" desaturated, right half "after" vivid |
| Commentary | Question + face | "Why did he do this?" + reaction face |
| Versus / Duel | Subject A vs Subject B | Two faces or logos with "VS" between, contrasting colors |
| Listicle | Number + best item teased | Big "7" + the most intriguing item from the list |
| Documentary | Single iconic image + 1-2 word hook | Lone subject + "BURIED" |
Faces drive CTR more than any other element on YouTube. But the wrong face hurts.
DO:
- Eye direction points at the text or object (viewers follow gaze)
- Mouth open >50% (shock, awe, laughter, fear) — telegraphs energy
- Exaggerated expression that does NOT default to "smile"
- Face occupies 30-50% of frame
- Eyes at the upper third (rule of thirds)
DON'T:
- Resting face / mild smile (reads as "nothing to see here")
- Eye contact with camera unless it's a confessional/serious topic
- Face cropped at the chin or forehead (looks accidental)
- Face hidden behind text or props (defeats the purpose)
- Sunglasses or hats covering eyes (no emotional read)
A face with mouth closed and a mild smile is the single most common failure pattern across struggling channels. The agent flags this on every audit.
- 3-4 words MAXIMUM (if any). Often 0 words is correct.
- Text must NOT duplicate the title — that wastes both surfaces.
- Stroke / outline (3-6px) for legibility on any background.
- Sans-serif, heavy weight (Impact, Bebas, Anton, Montserrat Black).
- Readable at 90px height — if it's not, it's decoration not communication.
- Color must contrast both background AND any face skin tones.
- One text element only. Two text blocks fight each other.
The agent often recommends removing text entirely. A strong face + object combination beats a busy thumbnail in nearly every test.
Niche tendencies:
- Gaming: saturated complementary (orange/teal, magenta/yellow)
- Tech: cool blues/grays + one warm accent
- Finance: green/red avoided (color-blind unsafe + cliché); use blue/orange
- Vlogs: warm skin-tone-friendly palettes
- Kids: maximum saturation across the rainbow
- Documentary: desaturated + one saturated accent
Universal rules:
- Use complementary contrast (opposite hues) for the focal point.
- Avoid red/green as the only contrast pair (~8% of men are color-blind).
- The background should be 1-2 hues, not a photo with full hue variance.
- If the channel has a brand color, it should appear but not dominate.
70%+ of YouTube traffic is mobile. The agent always tests at 90x53px preview size — the size YouTube actually displays in the mobile feed.
Mobile-first checklist:
[ ] Subject still recognizable at 90x53px
[ ] Text (if any) still legible at 90x53px
[ ] Color contrast survives the resize and compression
[ ] Focal point is unambiguous — no competing elements
[ ] Faces don't become "blob with eyes" at small size
If any check fails, the thumbnail does not survive contact with the actual feed.
The curiosity gap is the difference between what the thumbnail/title imply and what the viewer needs to watch to learn. Too little gap = no clicks. Too much = clickbait, audience retention craters, algorithm punishes.
TOO VAGUE (no clicks):
Title: "My day"
Thumbnail: vlogger face, neutral
Problem: zero promise, zero curiosity
CALIBRATED (clicks AND retention):
Title: "The day I almost quit YouTube"
Thumbnail: vlogger face, hand on forehead, dim background
Promise: emotional story; payoff: actual story exists in the video
TOO CLICKBAIT (clicks but tanks watch time):
Title: "I QUIT YOUTUBE FOREVER!!!"
Thumbnail: face screaming, "GOODBYE" in red
Problem: viewer feels lied to in 30 seconds, retention dies, algorithm down-ranks
The agent grades curiosity gap on a 1-5 scale and warns when it crosses into clickbait territory.
YouTube Studio's built-in thumbnail test feature is the ground truth. Don't trust subjective preference.
Setup:
- Upload up to 3 thumbnail variants in YouTube Studio.
- YouTube rotates them and measures CTR + watch time, not just CTR.
- Default test window: ~2 weeks. The agent recommends a minimum 7-day window.
Statistical significance:
- Need ≥1,000 impressions per variant for early read.
- Need ≥5,000 impressions per variant for confidence on small (<1pp) deltas.
- If no variant wins by 14 days, they're effectively tied — pick on retention.
What YouTube actually picks:
- It optimizes for "click + watch", not click alone.
- A higher-CTR thumbnail can LOSE if its viewers bounce faster.
- This is why clickbait loses long-term.
Variant design strategy:
- Variant A: current best guess (the safe formula)
- Variant B: same composition, different face/expression
- Variant C: structurally different concept (e.g., different formula entirely)
Don't test 3 near-identical variants — you learn nothing.
The agent screens for these failure patterns on every audit:
1. Face hidden by text overlay — defeats the entire point of including a face
2. Low contrast — subject blends into background, especially at mobile size
3. No focal point — eye doesn't know where to land, viewer scrolls past
4. Photoshop overload — too many cutouts, glows, layers; reads as desperate
5. Generic stock images — viewers' eyes are trained to skip these
6. Inconsistent style across channel — no brand recognition in the feed
7. Thumbnail = title — wastes the second surface; should add information
8. All-caps title fatigue — pair with a non-text thumbnail for relief
9. Small face — face below ~30% of frame loses emotional read at mobile size
10. Background photo with full hue variance — competes with the subject
Individual thumbnails fight for the click. The channel as a whole fights for recognition.
Brand consistency elements:
- Color palette: 2-3 colors used across all thumbnails (one dominant + accents)
- Font system: ONE display font across all thumbnails with text
- Recurring element: a logo corner, an arrow style, a frame, a stroke color
- Subject consistency: if the creator is the brand, they're always present
Test: open the channel page. Do the thumbnails feel like a set, or 50 random images?
A set wins subscribers. Random images don't.
Counter-rule: don't sacrifice CTR for branding. If a video genuinely needs to break
the pattern (e.g., a serious topic on a comedy channel), break it deliberately.
Sketch → 30-second pencil/whiteboard sketch of the layout (no detail)
Draft → built thumbnail at full resolution
Review → does it survive the 90x53px test? does it score on the audit framework?
Test → upload as A/B variant, run 7-14 days
Iterate → keep winners, retire losers, codify the pattern into a formula
Skipping the sketch step is the most common failure. Designers commit to a layout in Photoshop and become anchored — sketching forces multiple options before any time is sunk.
The thumbnail and title are two surfaces. They should COMBINE for curiosity, not duplicate.
DUPLICATION (wasted surface):
Title: "How to fix slow Postgres queries"
Thumbnail: text "FIX SLOW POSTGRES" + face
Problem: thumbnail tells viewer what title already tells them
COMBINATION (compounding curiosity):
Title: "How to fix slow Postgres queries"
Thumbnail: face shocked + "200x faster" + clock graphic
Effect: title sets the topic, thumbnail adds the unstated promise
GENERAL RULE:
- Title carries the SUBJECT.
- Thumbnail carries the EMOTION + STAKES.
- Together they form the click decision.
Weak concept:
Game: Elden Ring
Title: "Elden Ring Episode 12"
Thumbnail: gameplay screenshot with the game's HUD visible
Audit: no face, no emotion, no curiosity gap, episode number signals
"you must have seen 1-11 to care", reads as personal vlog not content
Strong rewrite:
Title: "I Beat the Hardest Boss in Elden Ring"
Thumbnail formula: face + game text + key object
- Left 40%: streamer's face, mouth open in shock, eyes pointing right
- Right 60%: silhouette of the boss with red glow + small "FIRST TRY?" text
- Color: warm orange face vs cool dark background (complementary contrast)
Reasoning:
- Episode number replaced with a specific stakes-laden hook
- Face provides emotion the gameplay screenshot can't
- Eye direction guides viewer to the boss silhouette
- Text adds doubt ("FIRST TRY?") that the title doesn't answer
- Mobile test: face still readable at 90px, boss silhouette still recognizable
Weak concept:
Title: "Postgres Index Tutorial"
Thumbnail: screenshot of a SQL editor with text "POSTGRES INDEXES" overlaid
Audit: text duplicates title, no face, no emotion, no result shown,
no reason to click vs the other 50 Postgres tutorials in search
Strong rewrite:
Title: "I Made My Postgres Query 200x Faster"
Thumbnail formula: end result + face + arrow
- Center: split screen showing "12s" (red, struck through) and "60ms" (green)
- Bottom-right corner: small face (creator) with raised eyebrow expression
- Red arrow drawn from "12s" to "60ms"
- Color: red/green is risky for color-blind, so add a value contrast
(red is dark, green is bright) — survives without color
Reasoning:
- Result is the hook, not the topic
- Numbers are concrete and skimmable at mobile size
- Face adds personality to a normally dry topic
- Title says WHAT, thumbnail says HOW MUCH — combination not duplication
- Search audience sees a specific outcome, not generic "tutorial"
Weak concept:
Title: "Stock Market Update"
Thumbnail: stock chart screenshot with red and green candles
Audit: every finance channel uses this exact image, infinite scroll past,
no face, no emotional stakes, no specific claim, no curiosity gap
Strong rewrite:
Title: "The Fed Just Made a $2 Trillion Mistake"
Thumbnail formula: question + face + iconic object
- Left 50%: creator's face, hand on forehead, mouth slightly open
- Right 50%: Fed building or Powell silhouette + "$2T?" in heavy text
- Color: cool desaturated background + warm face (separation)
- Background: dim, almost monochrome — eyes go to the face, then text
Reasoning:
- Specific dollar figure beats generic "update"
- Face reaction telegraphs that something is wrong (emotional stakes)
- "?" in the text adds doubt — invites click to learn the answer
- Avoided red/green chart cliché — visually distinct in finance feed
- Curiosity gap: title makes the claim, thumbnail confirms the emotion,
video must explain WHY — this is the calibrated click contract
- WARNING: only works if the video actually delivers the analysis;
same thumbnail with a thin video tanks retention and the algorithm punishes
The agent produces:
Provide the last 10 thumbnails and titles. The agent finds the pattern (usually faces, contrast, or curiosity gap) and proposes a formula change.
The agent runs each through the audit framework and recommends the strongest, plus suggests modifications. If two are close, the agent recommends an A/B test rather than guessing.
The agent proposes a 2-3 color palette, font, recurring element, and template formula based on the niche.
Almost always a clickbait calibration problem. The agent grades the curiosity gap and recommends pulling the thumbnail back toward honest signaling.
Default answer: no text, unless text adds specific information the face/object can't (a number, a date, a question). The agent explains the tradeoff.
This skill is for standard YouTube video thumbnails (16:9, browse + search surfaces). Don't use it for: