Openclaw Thumbnail Forge

Local thumbnail generator for videos. Picks the best candidate frames using brightness, sharpness, and scene-change scores, composes professional thumbnails with text overlays, gradient bars, and watermarks, and ranks A/B variants on objective click-likelihood metrics. Exports at YouTube, Shorts, Instagram, X, and LinkedIn sizes. Pure ffmpeg + Pillow, no AI APIs, no remote calls.

Audits

Pass

Install

openclaw skills install openclaw-thumbnail-forge

openclaw-thumbnail-forge

v0.2.0

A practical thumbnail generator for videos. Builds the kind of professional-looking thumbnails creators normally make in Photoshop or Canva, but as a local CLI workflow with no API keys, no online services, and no AI dependencies.

What this skill does

  • scripts/check_deps.sh — verify ffmpeg, ffprobe, python3 (and the Pillow Python package) are installed.
  • scripts/pick_frames.py — extract candidate frames from a video and rank them by a composite score combining sharpness, brightness, contrast, and ffmpeg scene-change scores. Outputs the top-N frames as PNG files plus a JSON report.
  • scripts/compose_thumbnail.py — turn one source frame into a finished thumbnail with bold title text, subtitle, gradient bar, optional logo overlay, and auto contrast boost. Supports custom fonts and color schemes.
  • scripts/export_sizes.py — re-export a finished thumbnail to all common platform sizes in one command (YouTube, Shorts, Instagram square, X/Twitter, LinkedIn).
  • scripts/make_variants.py — generate four A/B-testable variants of the same thumbnail (different color schemes, text placements, contrast levels) for split-testing.
  • scripts/score_thumbnail.py (NEW in v0.2.0) — score one or more finished thumbnails on six objective visual metrics and pick the most likely click-winner. Gives a numeric click-likelihood score (0-100) per thumbnail and an explanation of which metrics drove the result.

What this skill does not do

To set expectations honestly:

  • It does not use AI subject detection or face recognition. Frame ranking and click-likelihood scoring are statistical, not semantic.
  • It does not download fonts, stock photos, or any remote asset. You provide your own font path or use the system default.
  • It does not perform OCR, transcription, or generative editing.
  • It does not write outside the directory you provide.
  • The click-likelihood scorer is a deterministic heuristic, not a real ML CTR model. It captures widely-cited thumbnail design rules (punch, focal pop, color punch, text band, brightness, edge density). Treat its output as a tie-breaker, not a guarantee.

Required dependencies

bash scripts/check_deps.sh

Verifies ffmpeg, ffprobe, python3, and that PIL (Pillow) is importable. Pillow is the only Python dependency:

pip install Pillow

Workflows

1. Pick the best candidate frames from a video

python3 scripts/pick_frames.py input.mp4 ./frames/ \
  --top 10 --interval 2.0

Extracts a frame every 2 seconds, scores each one, and writes the top 10 as frames/frame_001.png through frames/frame_010.png plus a frames/report.json with per-frame scores.

Tunable flags:

  • --interval <seconds> — sampling interval (default 2.0)
  • --top <N> — how many top frames to keep (default 10)
  • --min-brightness <0-255> / --max-brightness <0-255> — reject frames that are too dark or blown out
  • --min-sharpness <float> — reject blurry frames
  • --relax-on-empty (NEW in v0.2.0) — if no frame passes the filters (very short clip, very dark video, single-subject still), retry once with very loose thresholds so you still get at least one candidate

For a video shorter than 2 * interval, the script now automatically falls back to 3 evenly-spaced samples instead of returning zero candidates.

2. Compose a finished thumbnail from a frame

python3 scripts/compose_thumbnail.py frames/frame_003.png thumb.png \
  --title "10 ffmpeg Tricks I Wish I Knew Sooner" \
  --subtitle "A practical tour" \
  --color-scheme bold-yellow \
  --position bottom

Color schemes shipped: bold-yellow, clean-white, red-alert, cool-blue, tech-green. Each scheme defines title color, outline color, shadow, and gradient bar opacity.

Position options: top, bottom, center. The script auto-fits the title size to the available width and adds a readable gradient bar behind the text so the thumbnail reads at small sizes too.

Optional logo overlay:

python3 scripts/compose_thumbnail.py frames/frame_003.png thumb.png \
  --title "Your Title" \
  --logo logo.png --logo-corner top-right --logo-scale 0.12

In v0.2.0, the script now rejects an empty --title "" (instead of silently producing a textless thumbnail) and prints a clean error if the input image is corrupt or unreadable (instead of leaking a Python traceback).

3. Export to all platform sizes at once

python3 scripts/export_sizes.py thumb.png ./out/

Writes:

  • out/youtube_1280x720.png
  • out/shorts_1080x1920.png
  • out/instagram_1080x1080.png
  • out/x_1200x675.png
  • out/linkedin_1200x627.png

4. Generate A/B variants

python3 scripts/make_variants.py frames/frame_003.png ./variants/ \
  --title "10 ffmpeg Tricks" \
  --subtitle "A practical tour"

Writes 4 variants with different color schemes and positions, ideal for click-rate split testing.

5. Score finished thumbnails on click-likelihood (NEW in v0.2.0)

python3 scripts/score_thumbnail.py variants/*.png

Scores every thumbnail on six objective metrics and prints a ranked list with the winner highlighted:

Sub-scoreWhat it measures
punchGlobal luminance contrast
focal_popVariance of per-tile mean luminance — high when there is one obvious focal area
color_punchSaturation mean + saturation stddev (combined)
text_bandPresence of a high-contrast horizontal text band (long run of high-edge-density rows)
brightnessDistance from the optimal mid-tone (penalty for too dark or washed-out)
edge_densityMean edge magnitude — peaks at mid values, penalised at extremes

Each sub-score is normalised to [0, 100] and combined with weights 0.18 / 0.22 / 0.15 / 0.20 / 0.12 / 0.13. Final click_score is in [0, 100].

When given two or more thumbnails, the script also prints an explanation: which metric drove the gap, by how much, for each pairwise comparison vs the winner.

JSON mode:

python3 scripts/score_thumbnail.py variants/*.png --output ranking.json --json

Full pipeline example

# 1) Find the best candidate frames
python3 scripts/pick_frames.py my_video.mp4 ./frames/ --top 5 --interval 1.5

# 2) Generate four variants from the top frame
python3 scripts/make_variants.py frames/frame_001.png ./variants/ \
  --title "Your Title Here" --subtitle "Optional subtitle"

# 3) Score the variants and pick the click-winner
python3 scripts/score_thumbnail.py variants/*.png --output ranking.json

# 4) Export the chosen variant to every platform size
python3 scripts/export_sizes.py variants/variant_b_clean_white_top.png ./out/

Exit codes

CodeMeaning
0success
1partial failure (no frames passed filters; no scorable images among inputs)
2error (bad arguments, unsafe path, missing or corrupt input, ffmpeg/ffprobe failure)

Safety properties

  • All Python helpers use subprocess.run with argument lists (never shell=True) and reject input/output paths containing shell metacharacters via a strict regex allowlist.
  • The skill never reads or writes outside the input/output paths the user provides.
  • No environment variables are read for credentials. No tokens, secrets, or API keys are required.
  • No remote calls of any kind. The skill only invokes locally installed ffmpeg and the Python Pillow library.

Known limitations

  • Frame scoring and thumbnail click-likelihood scoring are heuristic, not AI-based. They are not aware of "is the subject's face visible" — they maximise objective image-quality signals and proxies for visual hierarchy.
  • Default font is the system default if --font is not provided. If no usable font is found, the script falls back to Pillow's bitmap font, which looks plain. Pass --font for nice typography.
  • compose_thumbnail.py does not do automatic background removal. If you want isolated subjects, do the subject-cutout step in a different tool first.

v0.2.0 changes

New feature

  • scripts/score_thumbnail.py — deterministic local click-likelihood scorer. Scores one or more finished thumbnails on six visual metrics (punch, focal pop, color punch, text band, brightness, edge density) and ranks them. Pure Pillow + standard library, no ML, no remote calls.

Bug fixes

  • compose_thumbnail.py and make_variants.py now reject empty --title "" with a clear error instead of silently producing a textless thumbnail.
  • compose_thumbnail.py now catches PIL.UnidentifiedImageError on a corrupt or non-image input and prints a clean one-line error instead of leaking a Python traceback.
  • pick_frames.py now correctly returns exit code 2 (not 0) when ffprobe fails on a non-video input, when the video duration is zero, or when the input path contains shell metacharacters. Pipelines that key off exit codes will work correctly now.
  • pick_frames.py no longer silently produces zero frames on a very short clip (< 2 * interval). It now falls back to 3 evenly-spaced samples for short clips, and the new --relax-on-empty flag retries once with very loose thresholds when even the loose default produces no candidates.
  • Removed a redundant double-ffprobe call in probe_duration.

No breaking changes: existing CLI flags, output filenames, scoring formulas, and verdict thresholds are unchanged. v0.1.0 scripts and pipelines continue to work.

License

MIT. See LICENSE.