Research to Music Video Pipeline

Create 60s music videos from research articles using storyboard PDF, image/video/audio gen, and ffmpeg assembly.

Install

openclaw skills install @bunsdev/research-to-music-video-pipeline

Research-to-Music-Video Pipeline

Turn a research article into a finished 60-second music video and a structured storyboard PDF.

Prime Directive

Run this as a resumable production pipeline, not as a chat-only brainstorm. The final deliverables are:

final.mp4: stitched 60-second music video with generated song underneath.
storyboard.pdf: structured PDF with narrative storyboard, image prompts, video prompts, music direction, source notes, and provider/job metadata.

Required Flow

Accept article input as pasted text, local file, PDF URL, or article URL.
Run research extraction and prompt generation.
Produce a structured storyboard document and render it to PDF.
Generate one image still per scene.
Generate one video clip per scene from the scene still plus video prompt.
Generate one original song/audio bed based on the article theme and mood.
Stitch clips in storyboard order and mix the song underneath.
Return final.mp4, storyboard.pdf, and a concise run summary.

Approval Gates

Do not spend paid credits or launch long-running generation jobs until the user has approved:

Provider choices for image, video, and audio.
Scene count and target duration.
Whether to proceed from prompt pack/storyboard into media generation.

You may generate the research brief, prompt pack, and storyboard draft before that approval when no paid or external generation is required.

Provider Wiring

Use available runtime tools exactly as exposed. If a tool is deferred, discover and load the exact callable spec first.

Target providers from the user request:

Orchestration and prompt rewriting: GPT-4o or the best available current reasoning/orchestration model.
Video: SkyReels V3 Reference-to-Video via apifree.ai, Kling 3.0, or another explicitly configured reference-to-video provider.
Audio: Suno, MiniMax audio, or another explicitly configured music generation provider.
Image: the configured image generation tool/model suitable for cinematic stills.
Assembly: local ffmpeg/ffprobe.

If a target provider is unavailable or missing credentials, produce the storyboard/prompt pack and ask for the missing provider/credential. Do not silently substitute providers for final generation unless the user approves.

Run Directory

Create one timestamped run directory:

text

research-music-video-runs/<safe-title>-YYYYMMDD-HHMMSS/

Use this structure:

text

source/
  article.pdf | article.html | article.txt
storyboard.md
storyboard.pdf
prompts.json
manifest.json
images/
  scene-01.png
videos/
  scene-01.mp4
audio/
  song.mp3 | song.wav
assembly/
  concat.txt
  normalized/
final.mp4

Update manifest.json after every completed step so interrupted runs can resume.

Research Extraction

Extract or infer:

Title, authors, URL/path, date/venue if available.
Abstract or short article summary.
Core thesis.
3-6 key ideas.
Methods/evidence if present.
Limitations or uncertainties.
Useful visual metaphors.
Terms that must remain scientifically accurate.

If extraction is partial, state that in the storyboard and preserve the source file.

Storyboard Contract

Create storyboard.md with these sections:

Title page metadata: article title, source, date, generated video title, duration, aspect ratio.
Research brief: thesis, key ideas, findings, caveats.
Music direction: genre, mood, tempo, energy arc, instrumentation, vocal/lyric preference.
Scene table with one row per scene:
- scene id
- duration seconds
- narrative beat
- visual metaphor or literal representation
- image prompt
- video prompt
- transition notes
- music cue
Provider plan: chosen image/video/audio providers and pending credentials if any.
Source notes and caveats.

Render storyboard.md to storyboard.pdf using available local tooling. Use pandoc, playwright print-to-PDF, or another reliable renderer already available in the environment.

Scene Defaults

Unless the user specifies otherwise:

Total duration: 60 seconds.
Scene count: 6.
Scene duration: 10 seconds each.
Aspect ratio: 16:9.
Style: cinematic, research-faithful, visually legible, emotionally engaging.
Audio: original instrumental song with no vocals.

Adjust scene count only when provider duration limits require it.

Prompt Generation

For every scene, generate:

image_prompt: still-image prompt with subject, composition, style, lighting, camera/lens, aspect ratio, and negative constraints.
video_prompt: reference-to-video prompt with motion, camera movement, temporal progression, transformation, continuity requirements, and mood.

Rules:

Keep scientific claims faithful to the article.
Use metaphor for abstract concepts, but label the metaphor in the storyboard.
Avoid copyrighted artist/style imitation. Use descriptive genre and visual language.
Keep prompts provider-ready and self-contained.

Image Generation

For each scene:

Send image_prompt to the selected image generator.
Save still as images/scene-XX.<ext>.
Check it exists, is nonzero size, and roughly matches aspect ratio.
Record provider, request id/job id, prompt, and path in manifest.json.

If an image is blank, corrupt, wrong subject, or unusable, regenerate that scene only.

Video Generation

For each scene:

Send video_prompt plus matching still to the selected reference-to-video provider.
Prefer SkyReels V3 via apifree.ai or Kling 3.0 when available and approved.
Save clip as videos/scene-XX.mp4.
Verify clip exists, has nonzero duration, and is playable with ffprobe.
Record provider, request id/job id, prompt, still path, clip path, status, and duration.

If a video job is pending, report the job id and do not claim final completion. If one scene fails, retry or substitute only that scene with user approval.

Music Generation

Create an audio prompt from the research brief and storyboard:

Genre.
Mood.
Tempo.
Instrumentation.
Energy arc.
Target duration at least 60 seconds.
Vocals/lyrics only if the user asks.

Generate original music with Suno, MiniMax audio, or another approved provider. Save under audio/ and record prompt, provider, job id, and file path in manifest.json.

Assembly

Use ffmpeg to create final.mp4:

Normalize clips to a common resolution, frame rate, codec, and pixel format.
Concatenate clips in storyboard order.
Trim or fade audio to match exact video duration.
Mix the song underneath.
Export H.264/AAC MP4 unless user requests another format.

Use simple cuts by default. Add crossfades only when they are easy to verify and do not introduce timing problems.

Verification Before Completion

Before reporting success:

Verify storyboard.pdf exists and has nonzero size.
Verify every scene has an image and clip path in manifest.json.
Run ffprobe on every clip or at least confirm nonzero duration for each clip.
Run ffprobe on final.mp4 and confirm video and audio streams exist.
Confirm final duration is close to 60 seconds, or explain the exact duration.
Scan manifests/logs/summaries for accidentally stored secrets.

Useful commands:

bash

ffprobe -v error -show_entries format=duration -of default=nw=1:nk=1 final.mp4
ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height -of default=nw=1 final.mp4
ffprobe -v error -select_streams a:0 -show_entries stream=codec_name -of default=nw=1 final.mp4

Final Response

Return only high-signal status:

Final video path.
Storyboard PDF path.
Source article title.
Provider choices used.
Final duration.
Any failed, retried, substituted, or pending jobs.

Do not claim the video is complete unless the verification gate passed.

Failure Modes

Article extraction fails: ask for pasted text or a local file.
Provider credential missing: stop at storyboard/prompt pack and ask for credential/provider setup.
Video provider times out: preserve job id and ask whether to wait, retry, or substitute.
Audio generation fails: ask before assembling a silent draft.
ffmpeg missing: return storyboard, prompts, generated assets, and exact assembly plan.