Slideshow Video
Generate a repeatable short-form slideshow pipeline from local images, remote image URLs, or lightweight image queries and a JSON project file. This skill covers query resolution, PNG slide generation, MP4 export, optional background music, remote image caching, sentence-level sync exports, and a simple project wrapper that saves output metadata for downstream scheduling.
Image queries can resolve in three ways:
- stock-image lookup via Pinterest or Unsplash
- native GPT image generation via
openai/gpt-image-2
- Kie-hosted GPT image generation via
kie/gpt-image-2-text-to-image
Quick start
- Prepare 5 to 8 local images, remote image URLs, or image queries for one slideshow.
- Copy
references/pipeline.example.json to a working JSON file and replace the image sources and copy.
- Run the full pipeline:
python3 ~/.openclaw/skills/slideshow-video/scripts/run_pipeline.py your-project.json --output-root build --overwrite
To process a directory of project files, use:
python3 ~/.openclaw/skills/slideshow-video/scripts/batch_pipeline.py /path/to/projects --output-root build --overwrite
- Review the generated slides and MP4 on a phone-sized canvas.
- Use
summary.json for caption and hashtag handoff into your posting workflow.
Core resources
scripts/resolve_images.py: resolve imageQuery values into usable remote image URLs or generated local image files
scripts/generate_slides.py: generate 1080x1920 PNG slides from local images, remote image URLs, and text blocks
scripts/export_mp4.py: convert ordered slide PNGs into an H.264 vertical MP4, with optional background music
scripts/export_sync_mp4.py: export a voice-synced MP4 from slide PNGs plus per-line audio files, holding each slide for that line's measured duration
scripts/run_pipeline.py: run one project and emit summary.json
scripts/batch_pipeline.py: run multiple JSON project files from a directory
references/pipeline.example.json: starter project file with slide, caption, hashtag, and video settings
references/slides-config.example.json: simpler slide-only config when you do not need project metadata
references/workflow.md: structure, command examples, shorts sync workflow, and practical caveats
Project JSON format
At the top level, use:
slug: identifier for output folders and the mp4 name
caption: final post caption
hashtags: list of hashtags
defaultImageQuery: optional fallback query for image sourcing
video: export options
audio: optional background music options
slides: the slide array
Inside video:
enabled: set false to skip MP4 export
secondsPerSlide: hold time per slide
fps: output FPS, usually 30
zoom: enable a light Ken Burns style zoom
fade: optional fade in duration per slide
Inside audio:
path: local audio file
url: remote audio URL if ffmpeg can read it in your environment
volume: optional background music volume multiplier, defaults around 0.22
For shorts that need strict voice sync, keep the project JSON focused on slide images plus on-screen text, then generate one audio file per spoken line outside the project JSON and export with scripts/export_sync_mp4.py.
Each slide accepts:
imagePath: local source image
imageUrl: remote source image
imageQuery: short sourcing query such as minimal finance desk
overlay: optional black overlay opacity from 0 to 255
blur: optional Gaussian blur radius
brightness: optional brightness multiplier, for example 0.9
output: optional output filename
text: array of text blocks
Each text block accepts:
text: required displayed text
size: font size in pixels
bold: boolean shortcut for heavier font selection
weight: optional string, bold also works
x: horizontal anchor, defaults to center
y: vertical anchor
align: left, center, or right
maxWidth: wrapping width in pixels
color: hex color, defaults to white
lineSpacing: defaults to 1.2
shadow: defaults to true
strokeWidth and strokeFill: optional text outline
fontPath: optional absolute or local font path
Dependencies
Install Pillow for slide generation:
python3 -m pip install pillow
Install ffmpeg for MP4 export if it is not already present.
Remote images are downloaded and cached automatically when you use imageUrl or when imagePath is itself an http/https URL.
When a slide only has imageQuery, the pipeline can resolve it into either a remote image URL or a generated local image file first, writes resolved-project.json, then continues normally. Review resolved images before posting because query-based sourcing is convenience-first, not quality-safe, and model-generated imagery should also be checked for brand fit.
GPT Image 2 support
Use imageQuery with either --image-source openai or --image-source kie when you want the slideshow pipeline to generate slide art instead of searching the web.
Examples:
python3 ~/.openclaw/skills/slideshow-video/scripts/resolve_images.py project.json --source openai --image-size 1024x1536 --output build/resolved-project.json
python3 ~/.openclaw/skills/slideshow-video/scripts/resolve_images.py project.json --source kie --image-size 1024x1536 --output build/resolved-project.json
python3 ~/.openclaw/skills/slideshow-video/scripts/run_pipeline.py project.json --image-source kie --image-size 1024x1536 --output-root build --overwrite
Notes:
openai maps to openai/gpt-image-2
kie maps to kie/gpt-image-2-text-to-image
- GPT image resolution requires an active OpenClaw session runtime so
resolve_images.py can call the image_generate tool through the local session API
- generated slide assets are written into the resolved project as
imagePath values
Good defaults
- Keep slide 1 to one strong hook and one supporting line.
- Start hooks around
84 to 96 px.
- Start body lines around
48 to 60 px.
- Keep most text blocks within
820 to 940 px max width.
- Use one visual subject per slide when possible.
- Start with
3 seconds per slide and zoom: true for a more alive MP4.
- Start background music around
0.18 to 0.25 volume so it does not overpower on-screen text.
- For TikTok-native shorts, shorten on-screen text until each slide only carries one core idea.
- For voice-led shorts, prefer one spoken sentence per slide and use synced export instead of fixed
secondsPerSlide.
Editing guidance
Adjust readability in this order:
- raise
overlay
- reduce
maxWidth
- lower font size slightly
- move the
y positions away from busy background areas
- add
strokeWidth if the image is still noisy
If the MP4 feels too static, enable zoom. If it feels too synthetic, disable it and keep the PNG slideshow output instead.
Output expectations
Shorts sync workflow
Use this when voice, image, and on-screen text must stay aligned.
- Write one spoken sentence per target slide.
- Generate one numbered audio file per sentence, for example
line_01.mp3, line_02.mp3.
- Build slide PNGs with matching numbered order.
- Export with
scripts/export_sync_mp4.py so each slide duration is based on the matching line audio length.
- Keep captions shorter than the spoken line. Treat the slide text as reinforcement, not a transcript.
Example:
python3 ~/.openclaw/skills/slideshow-video/scripts/generate_slides.py project.json --output-dir build/slides --cache-dir build/cache
python3 ~/.openclaw/skills/slideshow-video/scripts/export_sync_mp4.py build/slides ./line-audio build/post-sync.mp4 --overwrite
The sync export also writes <output>.sync.json with per-slide measured durations.
Output expectations
The pipeline writes:
build/<slug>/resolved-project.json
build/<slug>/slides/*.png
build/<slug>/<slug>.mp4
build/<slug>/summary.json
build/<slug>/cache/* for downloaded remote images
summary.json includes audio metadata when present.
Keep generated outputs outside the skill folder unless you are intentionally updating bundled examples.