Slideshow Video

Other

Generate TikTok-style slideshow assets and MP4 exports from local images, GPT Image 2 visuals, remote image URLs, or lightweight image queries plus structured copy. Use when creating 9:16 slideshow posts, turning hooks plus image sources into PNG slides, exporting those slides into a short vertical video, or building a low-cost short-form content pipeline with reusable JSON configs. Also use when producing shorts with sentence-level voice sync, tighter TikTok-style captions, per-line audio aligned to specific slides, or GPT Image 2 + ElevenLabs voice-led TikTok slideshows with explicit CTA endings.

Install

openclaw skills install slideshow-video

Slideshow Video

Generate a repeatable short-form slideshow pipeline from local images, GPT Image 2 outputs, remote image URLs, or lightweight image queries and a JSON project file. This skill covers query resolution, PNG slide generation, MP4 export, optional background music, remote image caching, sentence-level sync exports, and a simple project wrapper that saves output metadata for downstream scheduling.

Default production preferences

For TikTok/shorts builds in this workspace, default to these choices unless the requester says otherwise:

  • visuals: GPT Image 2 / image_generate scenes instead of flat synthetic gradients
  • voice: ElevenLabs
  • voice style: male, professional; when requested, push toward more personality and human emotion
  • structure: one spoken sentence per slide, with shorter on-screen copy than the narration
  • CTA: make the final slide and final voice line explicit; latest preferred short CTA is Visit Clawlite.ai

If a fast placeholder visual pass is used, do not present it as the final quality bar. Replace placeholder backgrounds with GPT Image 2 scenes before calling the slideshow ready.

Quick start

  1. Prepare 5 to 8 local images, GPT Image 2 outputs, remote image URLs, or image queries for one slideshow.
  2. Copy references/pipeline.example.json to a working JSON file and replace the image sources and copy.
  3. For production TikTok slideshows, generate or collect your GPT Image 2 scenes first, then write one voice line per slide for ElevenLabs.
  4. Run the full pipeline:
python3 ~/.openclaw/skills/slideshow-video/scripts/run_pipeline.py your-project.json --output-root build --overwrite

To process a directory of project files, use:

python3 ~/.openclaw/skills/slideshow-video/scripts/batch_pipeline.py /path/to/projects --output-root build --overwrite
  1. Review the generated slides and MP4 on a phone-sized canvas.
  2. Use summary.json for caption and hashtag handoff into your posting workflow.

Core resources

  • scripts/resolve_images.py: resolve imageQuery values into usable remote image URLs
  • scripts/generate_slides.py: generate 1080x1920 PNG slides from local images, remote image URLs, and text blocks
  • scripts/export_mp4.py: convert ordered slide PNGs into an H.264 vertical MP4, with optional background music
  • scripts/export_sync_mp4.py: export a voice-synced MP4 from slide PNGs plus per-line audio files, holding each slide for that line's measured duration
  • scripts/run_pipeline.py: run one project and emit summary.json
  • scripts/batch_pipeline.py: run multiple JSON project files from a directory
  • references/pipeline.example.json: starter project file with slide, caption, hashtag, and video settings
  • references/slides-config.example.json: simpler slide-only config when you do not need project metadata
  • references/workflow.md: structure, command examples, shorts sync workflow, and practical caveats

Project JSON format

At the top level, use:

  • slug: identifier for output folders and the mp4 name
  • caption: final post caption
  • hashtags: list of hashtags
  • defaultImageQuery: optional fallback query for image sourcing
  • video: export options
  • audio: optional background music options
  • slides: the slide array

Inside video:

  • enabled: set false to skip MP4 export
  • secondsPerSlide: hold time per slide
  • fps: output FPS, usually 30
  • zoom: enable a light Ken Burns style zoom
  • fade: optional fade in duration per slide

Inside audio:

  • path: local audio file
  • url: remote audio URL if ffmpeg can read it in your environment
  • volume: optional background music volume multiplier, defaults around 0.22

For shorts that need strict voice sync, keep the project JSON focused on slide images plus on-screen text, then generate one audio file per spoken line outside the project JSON and export with scripts/export_sync_mp4.py.

Each slide accepts:

  • imagePath: local source image
  • imageUrl: remote source image
  • imageQuery: short sourcing query such as minimal finance desk
  • overlay: optional black overlay opacity from 0 to 255
  • blur: optional Gaussian blur radius
  • brightness: optional brightness multiplier, for example 0.9
  • output: optional output filename
  • text: array of text blocks

Each text block accepts:

  • text: required displayed text
  • size: font size in pixels
  • bold: boolean shortcut for heavier font selection
  • weight: optional string, bold also works
  • x: horizontal anchor, defaults to center
  • y: vertical anchor
  • align: left, center, or right
  • maxWidth: wrapping width in pixels
  • color: hex color, defaults to white
  • lineSpacing: defaults to 1.2
  • shadow: defaults to true
  • strokeWidth and strokeFill: optional text outline
  • fontPath: optional absolute or local font path

Dependencies

Install Pillow for slide generation:

python3 -m pip install pillow

Install ffmpeg for MP4 export if it is not already present.

Remote images are downloaded and cached automatically when you use imageUrl or when imagePath is itself an http/https URL.

When a slide only has imageQuery, the pipeline resolves it into a remote image URL first, writes resolved-project.json, then continues normally. Review resolved images before posting because query-based sourcing is convenience-first, not quality-safe.

TikTok production defaults

  • Prefer GPT Image 2 visuals with realistic or cinematic tech/product scenes.
  • Prefer ElevenLabs for narration; start with a male professional voice, then lower stability / raise style modestly when the user wants more character.
  • Make slide 1 a strong contrarian or curiosity hook that lands inside the first 3 seconds.
  • Keep final CTA visible on-screen and also spoken in the last voice line.
  • When shortening for retention, cut slide count before shrinking text legibility.

Good defaults

  • Keep slide 1 to one strong hook and one supporting line.
  • Start hooks around 84 to 96 px.
  • Start body lines around 48 to 60 px.
  • Keep most text blocks within 820 to 940 px max width.
  • Use one visual subject per slide when possible.
  • Start with 3 seconds per slide and zoom: true for a more alive MP4.
  • Start background music around 0.18 to 0.25 volume so it does not overpower on-screen text.
  • For TikTok-native shorts, shorten on-screen text until each slide only carries one core idea.
  • For voice-led shorts, prefer one spoken sentence per slide and use synced export instead of fixed secondsPerSlide.

Editing guidance

Adjust readability in this order:

  1. raise overlay
  2. reduce maxWidth
  3. lower font size slightly
  4. move the y positions away from busy background areas
  5. add strokeWidth if the image is still noisy

If the MP4 feels too static, enable zoom. If it feels too synthetic, disable it and keep the PNG slideshow output instead.

Output expectations

Shorts sync workflow

Use this when voice, image, and on-screen text must stay aligned.

  1. Write one spoken sentence per target slide.
  2. Generate one numbered audio file per sentence, for example line_01.mp3, line_02.mp3.
  3. Build slide PNGs with matching numbered order.
  4. Export with scripts/export_sync_mp4.py so each slide duration is based on the matching line audio length.
  5. Keep captions shorter than the spoken line. Treat the slide text as reinforcement, not a transcript.

Example:

python3 ~/.openclaw/skills/slideshow-video/scripts/generate_slides.py project.json --output-dir build/slides --cache-dir build/cache
python3 ~/.openclaw/skills/slideshow-video/scripts/export_sync_mp4.py build/slides ./line-audio build/post-sync.mp4 --overwrite

The sync export also writes <output>.sync.json with per-slide measured durations.

Output expectations

The pipeline writes:

  • build/<slug>/resolved-project.json
  • build/<slug>/slides/*.png
  • build/<slug>/<slug>.mp4
  • build/<slug>/summary.json
  • build/<slug>/cache/* for downloaded remote images

summary.json includes audio metadata when present.

Keep generated outputs outside the skill folder unless you are intentionally updating bundled examples.