Youtube Thumbnail Coach

v1.0.0

Audit, design, and A/B test YouTube thumbnails for click-through rate. Critiques visual hierarchy, contrast, emotion, scale, curiosity gap, and mobile readab...

0· 0· 1 versions· 0 current· 0 all-time· Updated 3h ago· MIT-0

by@charlie-morrison

Security Scans

VirusTotalBenign ClawScanBenign Static analysisBenign

Install

openclaw skills install youtube-thumbnail-coach

YouTube Thumbnail Coach

Audit existing YouTube thumbnails, design new ones, and run A/B tests to maximize click-through rate (CTR) without sacrificing watch time. Acts as an expert thumbnail designer who knows niche conventions, mobile-first constraints, and the curiosity-gap mechanics that drive clicks.

Usage

Invoke this skill when you have a thumbnail (existing or planned) and need it to perform better, or when you want to design one from scratch.

Basic invocation:

Audit this thumbnail: [image or description] Design a thumbnail for a tutorial video on Postgres indexing My CTR is 3% on a tech channel — what's wrong with my thumbnails?

With context:

Here are my last 10 thumbnails and CTR data — find the pattern I have 3 thumbnail variants for an A/B test, which should I publish first? My title is "I Lost $40k on This Trade" — design a matching thumbnail

The agent reviews the thumbnail, the niche, the title, and the channel context to produce specific, actionable redesign recommendations.

How It Works

Step 1: Establish CTR Baseline by Niche

Before judging a thumbnail as "underperforming", the agent calibrates against niche norms. CTR is highly genre-dependent.

Niche	Typical CTR Range	Notes
Gaming (let's plays, walkthroughs)	8-12%	Bright palettes, exaggerated faces, game logos work hard
Tech (tutorials, reviews)	4-7%	Lower because audience is searchy, not browsing
Finance / Trading	3-6%	Skeptical audience; clickbait tanks watch time fast
Vlogs / Lifestyle	5-9%	Personality-driven, face-forward
Kids / Family	12-18%	Maximum saturation, characters, big reactions
Education / Documentary	4-8%	Curiosity gap is the entire game
Music	2-5%	Browse traffic mostly bypasses thumbnail

If the channel sits below the floor of its niche range, the thumbnail (or title) is the most likely culprit. If it sits comfortably inside the range, optimization is incremental, not corrective.

Step 2: Run the Audit Framework

The agent grades the thumbnail on six axes. Each is binary-ish: pass, marginal, or fail.

Visual hierarchy   - Does the eye know where to land first, second, third?
Contrast           - Does the subject pop from the background at 90px height?
Emotion            - Is there a clear, exaggerated feeling (≠ neutral, ≠ mild smile)?
Scale              - Is the focal subject ≥40% of the frame?
Curiosity gap      - Does it ask a question the title doesn't answer?
Mobile readability - Does it survive when shrunk to a phone preview?

Example audit output:

Visual hierarchy:   FAIL — three competing focal points (face, logo, background text)
Contrast:           MARGINAL — face blends into mid-tone background
Emotion:            FAIL — neutral expression, no story telegraphed
Scale:              PASS — subject is ~50% of frame
Curiosity gap:      MARGINAL — title and thumbnail say the same thing
Mobile readability: FAIL — text drops below 90px and becomes illegible

Diagnosis: thumbnail competes with itself. Rebuild with single focal point,
exaggerated expression, and text that contradicts or extends the title.

Step 3: Apply a Thumbnail Formula by Niche

The agent doesn't design from scratch every time. Each niche has formulas with proven CTR. Pick one, then customize.

Niche	Formula	Example
Gaming let's-play	Face + game text/logo + key object	Face reaction (left) + "FINAL BOSS" (top right) + boss silhouette
Tutorial / How-to	End result + face + arrow/circle	Finished UI screenshot + small face corner + red arrow at the magic part
Transformation	Before/after split	Left half "before" desaturated, right half "after" vivid
Commentary	Question + face	"Why did he do this?" + reaction face
Versus / Duel	Subject A vs Subject B	Two faces or logos with "VS" between, contrasting colors
Listicle	Number + best item teased	Big "7" + the most intriguing item from the list
Documentary	Single iconic image + 1-2 word hook	Lone subject + "BURIED"

Step 4: Apply Face Strategy

Faces drive CTR more than any other element on YouTube. But the wrong face hurts.

DO:
  - Eye direction points at the text or object (viewers follow gaze)
  - Mouth open >50% (shock, awe, laughter, fear) — telegraphs energy
  - Exaggerated expression that does NOT default to "smile"
  - Face occupies 30-50% of frame
  - Eyes at the upper third (rule of thirds)

DON'T:
  - Resting face / mild smile (reads as "nothing to see here")
  - Eye contact with camera unless it's a confessional/serious topic
  - Face cropped at the chin or forehead (looks accidental)
  - Face hidden behind text or props (defeats the purpose)
  - Sunglasses or hats covering eyes (no emotional read)

A face with mouth closed and a mild smile is the single most common failure pattern across struggling channels. The agent flags this on every audit.

Step 5: Apply Text Rules

- 3-4 words MAXIMUM (if any). Often 0 words is correct.
- Text must NOT duplicate the title — that wastes both surfaces.
- Stroke / outline (3-6px) for legibility on any background.
- Sans-serif, heavy weight (Impact, Bebas, Anton, Montserrat Black).
- Readable at 90px height — if it's not, it's decoration not communication.
- Color must contrast both background AND any face skin tones.
- One text element only. Two text blocks fight each other.

The agent often recommends removing text entirely. A strong face + object combination beats a busy thumbnail in nearly every test.

Step 6: Apply Color Theory

Niche tendencies:
  - Gaming:        saturated complementary (orange/teal, magenta/yellow)
  - Tech:          cool blues/grays + one warm accent
  - Finance:       green/red avoided (color-blind unsafe + cliché); use blue/orange
  - Vlogs:         warm skin-tone-friendly palettes
  - Kids:          maximum saturation across the rainbow
  - Documentary:   desaturated + one saturated accent

Universal rules:
  - Use complementary contrast (opposite hues) for the focal point.
  - Avoid red/green as the only contrast pair (~8% of men are color-blind).
  - The background should be 1-2 hues, not a photo with full hue variance.
  - If the channel has a brand color, it should appear but not dominate.

Step 7: Mobile-First Verification

70%+ of YouTube traffic is mobile. The agent always tests at 90x53px preview size — the size YouTube actually displays in the mobile feed.

Mobile-first checklist:
  [ ] Subject still recognizable at 90x53px
  [ ] Text (if any) still legible at 90x53px
  [ ] Color contrast survives the resize and compression
  [ ] Focal point is unambiguous — no competing elements
  [ ] Faces don't become "blob with eyes" at small size

If any check fails, the thumbnail does not survive contact with the actual feed.

Step 8: Calibrate the Curiosity Gap

The curiosity gap is the difference between what the thumbnail/title imply and what the viewer needs to watch to learn. Too little gap = no clicks. Too much = clickbait, audience retention craters, algorithm punishes.

TOO VAGUE (no clicks):
  Title: "My day"
  Thumbnail: vlogger face, neutral
  Problem: zero promise, zero curiosity

CALIBRATED (clicks AND retention):
  Title: "The day I almost quit YouTube"
  Thumbnail: vlogger face, hand on forehead, dim background
  Promise: emotional story; payoff: actual story exists in the video

TOO CLICKBAIT (clicks but tanks watch time):
  Title: "I QUIT YOUTUBE FOREVER!!!"
  Thumbnail: face screaming, "GOODBYE" in red
  Problem: viewer feels lied to in 30 seconds, retention dies, algorithm down-ranks

The agent grades curiosity gap on a 1-5 scale and warns when it crosses into clickbait territory.

Step 9: A/B Test Methodology

YouTube Studio's built-in thumbnail test feature is the ground truth. Don't trust subjective preference.

Setup:
  - Upload up to 3 thumbnail variants in YouTube Studio.
  - YouTube rotates them and measures CTR + watch time, not just CTR.
  - Default test window: ~2 weeks. The agent recommends a minimum 7-day window.

Statistical significance:
  - Need ≥1,000 impressions per variant for early read.
  - Need ≥5,000 impressions per variant for confidence on small (<1pp) deltas.
  - If no variant wins by 14 days, they're effectively tied — pick on retention.

What YouTube actually picks:
  - It optimizes for "click + watch", not click alone.
  - A higher-CTR thumbnail can LOSE if its viewers bounce faster.
  - This is why clickbait loses long-term.

Variant design strategy:
  - Variant A: current best guess (the safe formula)
  - Variant B: same composition, different face/expression
  - Variant C: structurally different concept (e.g., different formula entirely)
  Don't test 3 near-identical variants — you learn nothing.

Step 10: Catch Common Mistakes

The agent screens for these failure patterns on every audit:

1. Face hidden by text overlay — defeats the entire point of including a face
2. Low contrast — subject blends into background, especially at mobile size
3. No focal point — eye doesn't know where to land, viewer scrolls past
4. Photoshop overload — too many cutouts, glows, layers; reads as desperate
5. Generic stock images — viewers' eyes are trained to skip these
6. Inconsistent style across channel — no brand recognition in the feed
7. Thumbnail = title — wastes the second surface; should add information
8. All-caps title fatigue — pair with a non-text thumbnail for relief
9. Small face — face below ~30% of frame loses emotional read at mobile size
10. Background photo with full hue variance — competes with the subject

Step 11: Maintain Channel Branding Consistency

Individual thumbnails fight for the click. The channel as a whole fights for recognition.

Brand consistency elements:
  - Color palette: 2-3 colors used across all thumbnails (one dominant + accents)
  - Font system: ONE display font across all thumbnails with text
  - Recurring element: a logo corner, an arrow style, a frame, a stroke color
  - Subject consistency: if the creator is the brand, they're always present

Test: open the channel page. Do the thumbnails feel like a set, or 50 random images?
A set wins subscribers. Random images don't.

Counter-rule: don't sacrifice CTR for branding. If a video genuinely needs to break
the pattern (e.g., a serious topic on a comedy channel), break it deliberately.

Step 12: Run the Idea-to-Thumbnail Pipeline

Sketch    → 30-second pencil/whiteboard sketch of the layout (no detail)
Draft     → built thumbnail at full resolution
Review    → does it survive the 90x53px test? does it score on the audit framework?
Test      → upload as A/B variant, run 7-14 days
Iterate   → keep winners, retire losers, codify the pattern into a formula

Skipping the sketch step is the most common failure. Designers commit to a layout in Photoshop and become anchored — sketching forces multiple options before any time is sunk.

Step 13: Coordinate Thumbnail with Title

The thumbnail and title are two surfaces. They should COMBINE for curiosity, not duplicate.

DUPLICATION (wasted surface):
  Title:     "How to fix slow Postgres queries"
  Thumbnail: text "FIX SLOW POSTGRES" + face
  Problem: thumbnail tells viewer what title already tells them

COMBINATION (compounding curiosity):
  Title:     "How to fix slow Postgres queries"
  Thumbnail: face shocked + "200x faster" + clock graphic
  Effect: title sets the topic, thumbnail adds the unstated promise

GENERAL RULE:
  - Title carries the SUBJECT.
  - Thumbnail carries the EMOTION + STAKES.
  - Together they form the click decision.

Worked Examples

Example 1: Gaming Let's-Play

Weak concept:
  Game: Elden Ring
  Title: "Elden Ring Episode 12"
  Thumbnail: gameplay screenshot with the game's HUD visible
  Audit: no face, no emotion, no curiosity gap, episode number signals
         "you must have seen 1-11 to care", reads as personal vlog not content

Strong rewrite:
  Title: "I Beat the Hardest Boss in Elden Ring"
  Thumbnail formula: face + game text + key object
    - Left 40%: streamer's face, mouth open in shock, eyes pointing right
    - Right 60%: silhouette of the boss with red glow + small "FIRST TRY?" text
    - Color: warm orange face vs cool dark background (complementary contrast)
  Reasoning:
    - Episode number replaced with a specific stakes-laden hook
    - Face provides emotion the gameplay screenshot can't
    - Eye direction guides viewer to the boss silhouette
    - Text adds doubt ("FIRST TRY?") that the title doesn't answer
    - Mobile test: face still readable at 90px, boss silhouette still recognizable

Example 2: Tech Tutorial

Weak concept:
  Title: "Postgres Index Tutorial"
  Thumbnail: screenshot of a SQL editor with text "POSTGRES INDEXES" overlaid
  Audit: text duplicates title, no face, no emotion, no result shown,
         no reason to click vs the other 50 Postgres tutorials in search

Strong rewrite:
  Title: "I Made My Postgres Query 200x Faster"
  Thumbnail formula: end result + face + arrow
    - Center: split screen showing "12s" (red, struck through) and "60ms" (green)
    - Bottom-right corner: small face (creator) with raised eyebrow expression
    - Red arrow drawn from "12s" to "60ms"
    - Color: red/green is risky for color-blind, so add a value contrast
            (red is dark, green is bright) — survives without color
  Reasoning:
    - Result is the hook, not the topic
    - Numbers are concrete and skimmable at mobile size
    - Face adds personality to a normally dry topic
    - Title says WHAT, thumbnail says HOW MUCH — combination not duplication
    - Search audience sees a specific outcome, not generic "tutorial"

Example 3: Finance Commentary

Weak concept:
  Title: "Stock Market Update"
  Thumbnail: stock chart screenshot with red and green candles
  Audit: every finance channel uses this exact image, infinite scroll past,
         no face, no emotional stakes, no specific claim, no curiosity gap

Strong rewrite:
  Title: "The Fed Just Made a $2 Trillion Mistake"
  Thumbnail formula: question + face + iconic object
    - Left 50%: creator's face, hand on forehead, mouth slightly open
    - Right 50%: Fed building or Powell silhouette + "$2T?" in heavy text
    - Color: cool desaturated background + warm face (separation)
    - Background: dim, almost monochrome — eyes go to the face, then text
  Reasoning:
    - Specific dollar figure beats generic "update"
    - Face reaction telegraphs that something is wrong (emotional stakes)
    - "?" in the text adds doubt — invites click to learn the answer
    - Avoided red/green chart cliché — visually distinct in finance feed
    - Curiosity gap: title makes the claim, thumbnail confirms the emotion,
      video must explain WHY — this is the calibrated click contract
    - WARNING: only works if the video actually delivers the analysis;
      same thumbnail with a thin video tanks retention and the algorithm punishes

Output

The agent produces:

Audit grade: pass/marginal/fail across the six-axis framework
Diagnosis: ranked list of what's wrong, biggest impact first
Formula recommendation: which proven niche formula to apply
Specific redesign: layout, face direction, text, color, focal point
Mobile verification: confirmation that the design survives at 90x53px
Curiosity-gap calibration: 1-5 score with warning if it crosses into clickbait
A/B test plan: variants to test and the minimum window for significance
Branding check: how the new thumbnail fits the channel's existing set
Title coordination notes: whether title and thumbnail combine or duplicate

Common Scenarios

"My CTR is below my niche baseline"

Provide the last 10 thumbnails and titles. The agent finds the pattern (usually faces, contrast, or curiosity gap) and proposes a formula change.

"I have 3 thumbnail drafts — which is best?"

The agent runs each through the audit framework and recommends the strongest, plus suggests modifications. If two are close, the agent recommends an A/B test rather than guessing.

"I'm starting a new channel — design the visual system"

The agent proposes a 2-3 color palette, font, recurring element, and template formula based on the niche.

"My CTR is high but watch time is low"

Almost always a clickbait calibration problem. The agent grades the curiosity gap and recommends pulling the thumbnail back toward honest signaling.

"Should this video have text on the thumbnail or not?"

Default answer: no text, unless text adds specific information the face/object can't (a number, a date, a question). The agent explains the tradeoff.

Tips for Best Results

Share the actual image when possible — descriptions miss compositional issues
Include the title alongside the thumbnail; they must be analyzed together
Share niche and channel size so the agent calibrates baseline expectations
Provide CTR data from YouTube Studio if available — gives a measurable target
Mention any channel branding constraints (existing palette, font, logo)
For A/B tests, include the impressions count per variant — small samples mislead
Remember that thumbnail optimization is iterative; the agent's first redesign is a hypothesis, not a verdict

When NOT to use

This skill is for standard YouTube video thumbnails (16:9, browse + search surfaces). Don't use it for:

Livestreams in progress — live thumbnails are auto-generated mid-stream and follow a different attention system; design the pre-stream thumbnail with this skill, but mid-stream optimization is a different problem
YouTube Shorts — vertical 9:16 aspect ratio, played in a swipe feed, no thumbnail-driven click decision; the first 1-2 seconds of video are the equivalent of the thumbnail and require a different skill
Premieres before scheduled time — countdown UI overlays the thumbnail and changes the click mechanics
Unlisted / private videos — no algorithmic surface, thumbnail does no work
Channel trailers and channel art — different composition rules, different goals (subscribe, not click)
Music videos relying on browse from external links — most traffic bypasses the thumbnail entirely; effort better spent elsewhere

Version tags

latestvk977ax0190d0tcce2gsey2rv1n85w5q3