YouTube Thumbnail Generator with Nano Banana

v0.1.1

Create high-converting YouTube thumbnail concepts, overlay text, image prompts, and optional AI-generated cover images from raw titles, hooks, scripts, or ma...

1· 263·0 current·0 all-time
byMagiclight@luo-2q
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the implementation: scripts build a thumbnail plan with a Gemini text model and optionally call Gemini's image endpoint. Requested binaries (python3) and environment variables (GEMINI_API_KEY / GOOGLE_API_KEY) are appropriate and expected for this functionality.
Instruction Scope
SKILL.md instructs the agent to analyze input copy, build prompts, and optionally call local scripts that POST to Google's generativelanguage endpoint. The instructions reference only the provided scripts, output paths under outputs/, and the declared API keys — they do not instruct reading unrelated system files or other credentials.
Install Mechanism
There is no external install spec (instruction-only / local scripts), so nothing is downloaded from remote URLs. The included Python scripts are self-contained and use only the standard library for network calls; no third-party package installs or external archives are pulled.
Credentials
Only GEMINI_API_KEY (primary) or GOOGLE_API_KEY are required and are used to authenticate with Google's generativelanguage API. No unrelated secrets or many credentials are requested. The credential usage is proportional to the described purpose.
Persistence & Privilege
The skill is not forced always-on (always: false) and does not modify other skills or global agent settings. It writes outputs to local files under outputs/, which is expected behavior for a generator script.
Assessment
This package is internally consistent with its stated purpose, but review before running: (1) The scripts will send whatever you pass as copy to Google's generativelanguage API using your GEMINI_API_KEY/GOOGLE_API_KEY — avoid sending private or sensitive data. (2) Ensure the API key you provide has appropriate billing/usage controls and is stored securely. (3) The scripts write files to an outputs/ directory; run them in a workspace you control (or a sandbox) if you are cautious. (4) Run the included selftest (scripts/selftest.py) and inspect the scripts if you want to confirm behavior; the code is small and uses only standard library network calls to the official generativelanguage endpoint. (5) If you need tighter assurance, restrict the key scope, rotate keys after testing, or run the code in an isolated environment.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🖼️ Clawdis
Binspython3
EnvGEMINI_API_KEY, GOOGLE_API_KEY
Primary envGEMINI_API_KEY
image-generationvk972f5d4gyb1gj62ge8ptfbpm982hn86latestvk972f5d4gyb1gj62ge8ptfbpm982hn86prompt-generationvk972f5d4gyb1gj62ge8ptfbpm982hn86thumbnailvk972f5d4gyb1gj62ge8ptfbpm982hn86youtubevk972f5d4gyb1gj62ge8ptfbpm982hn86
263downloads
1stars
2versions
Updated 1mo ago
v0.1.1
MIT-0

YouTube Cover Nano Banana

Overview

Analyze the user's text first. Then turn it into a thumbnail concept that is visually simple, emotionally obvious, and readable at small sizes.

Generate English image prompts for nano banana unless the user explicitly asks for another language. Keep reasoning grounded in YouTube thumbnail performance rather than generic poster design.

Use scripts/create_thumbnail.py for the full workflow when local script execution is available. It first calls Gemini text generation to turn source copy into a thumbnail plan, then optionally calls the official Gemini Nano Banana image endpoint. The scripts expect GEMINI_API_KEY or GOOGLE_API_KEY.

Workflow

1. Extract the message

Pull out:

  • Core topic
  • Intended audience
  • Main promise or tension
  • Emotional direction such as urgency, surprise, authority, fear, curiosity, or excitement
  • Best visual subject
  • Any non-negotiable details such as product, person, brand color, or forbidden elements

If the user only gives raw copy, infer the thumbnail angle from the strongest claim instead of mirroring the entire text.

2. Choose the thumbnail strategy

Prefer one dominant idea. Use one of these visual strategies:

  • Expressive face plus short text
  • Single object or product in dramatic close-up
  • Before versus after contrast
  • Threat, mistake, or warning framing
  • Curiosity gap with an incomplete reveal
  • Authority or proof framing with a clear focal object

Reject cluttered multi-idea compositions unless the user explicitly wants a collage.

3. Compress the on-image text

Write overlay text that is:

  • 2 to 6 words when possible
  • Readable in one second
  • Stronger than the original copy
  • Different from the full video title if needed

Do not place paragraphs, subtitles, or detailed bullet points inside the image prompt.

4. Build the nano banana prompt

Produce a prompt with these properties:

  • English language
  • 16:9 YouTube thumbnail composition
  • One dominant subject
  • Bold focal point
  • High contrast lighting and color separation
  • Clean background with supporting elements only
  • Space reserved for large headline text
  • Photorealistic or stylized only if the user requests it

Explicitly describe:

  • Subject appearance and pose
  • Camera framing
  • Emotion
  • Background environment
  • Color palette
  • Lighting
  • Text placement area
  • Thumbnail style cues such as cinematic, glossy, creator-economy, tech, finance, fitness, education

Use the template and examples in youtube-thumbnail-patterns.md when you need help selecting the structure.

5. Generate the image

Call nano banana with the final prompt after the concept is coherent.

For the full automated workflow, run:

python3 scripts/create_thumbnail.py \
  --copy "Man fights tiger" \
  --generate-image \
  --output-json "outputs/thumbnail-plan.json" \
  --image-output "outputs/generated-thumbnail.png"

This script:

  • Analyzes the source copy
  • Returns structured JSON with angle, overlay_text, prompt, and generation_notes
  • Optionally renders the image with Nano Banana
  • Writes a stable result envelope for integration use

If local script execution is available, run:

python3 scripts/generate_image.py \
  --prompt "<final english prompt>" \
  --angle "<angle>" \
  --overlay-text "<overlay text>" \
  --output "outputs/generated-thumbnail.png"

The script calls Gemini's official gemini-2.5-flash-image endpoint and saves:

  • The generated PNG
  • A sidecar JSON file with prompt, model, overlay text, and any text returned by the API

If tool calling or script execution is not available, still return the exact prompt plus a short note on what to generate.

6. Self-critique once

Before finalizing, check for the common failure modes:

  • Too many subjects
  • Text area too small
  • Weak contrast
  • Busy background
  • Vague emotion
  • No obvious click-driving angle
  • Prompt accidentally describes a poster instead of a thumbnail
  • Prompt includes tiny text or multiple lines of copy that image models render poorly

If a failure mode is present, revise the prompt once before returning it.

Output Format

Return four blocks in this order:

  1. Angle: one sentence describing the thumbnail idea
  2. Overlay Text: short text for the cover
  3. Nano Banana Prompt: the exact English prompt
  4. Generation Notes: one short sentence with any critical instruction or fallback

Constraints

  • Optimize for clickability and legibility on mobile.
  • Favor one subject over many.
  • Favor strong emotion over neutral expression.
  • Favor simple composition over descriptive completeness.
  • Do not invent celebrity likenesses, trademarks, or brand assets the user did not request.
  • Do not promise exact text rendering quality from the image model.
  • If the user supplies Chinese copy, analyze in Chinese if helpful but output the final image prompt in English.
  • If the user gives no style direction, default to modern YouTube thumbnail aesthetics with bold contrast and clean hierarchy.
  • If the user gives a niche that implies a visual style, reflect it in the prompt. Example: finance should feel sharp and credible; gaming can be more exaggerated; education should feel clear and authoritative.

Missing Information

Ask a brief follow-up only when a missing detail would materially change the output, such as:

  • Whether a specific person must appear
  • Whether a real product image is required
  • Whether the thumbnail should use photorealistic, 3D, illustrated, or anime style
  • Whether there are strict brand colors or banned visual elements

Otherwise, make reasonable assumptions and proceed.

Resources

scripts/

Use create_thumbnail.py for end-to-end copy-to-thumbnail generation.

Use generate_image.py to call Nano Banana directly and save output files.

references/

Use youtube-thumbnail-patterns.md for prompt scaffolds, angle selection rules, and example transformations from raw copy to thumbnail prompt.

Use publishing-contract.md as the integration contract for callers that need stable command behavior, output JSON, and exit codes.

Comments

Loading comments...