Install
openclaw skills install yummy-gen-videoUse when the user wants to generate a video with Gemini Veo through yummycli, including text-to-video, image-to-video (single starting frame), and reference-image-guided generation (up to 3 images).
openclaw skills install yummy-gen-videoCreate videos with yummycli gemini veo using Google Veo.
Load this skill when the user asks to generate, create, or animate a video using AI — including text-to-video, animating a still image, or generating a video guided by reference images.
Prerequisite: Apply the
yummy-sharedskill first.
This skill covers three generation modes with a single command:
Two equivalent entry points are available:
| Entry point | When to use |
|---|---|
yummycli gemini veo | Default — human-friendly, Gemini Veo presets applied |
yummycli video generate --provider gemini | Scripting / automation — explicit, provider-agnostic form |
Both share the same flags and defaults. Prefer gemini veo unless the task explicitly requires the provider-agnostic form.
Basic usage:
yummycli gemini veo --prompt "<prompt>"
With one or more input images:
yummycli gemini veo \
--prompt "<prompt>" \
--input-image ./frame.png \
--input-image ./style.jpg
Optional output controls:
--output <file.mp4>
--model <model>
--aspect-ratio <ratio>
--duration <seconds>
--resolution <resolution>
Default values when omitted: --model veo-3.1-fast-generate-preview, --aspect-ratio 16:9, --duration 8, --resolution 1080p.
The number of --input-image flags determines the API path automatically:
| Count | Behaviour |
|---|---|
| 0 | Text-to-video. Prompt drives the entire generation. |
| 1 | Image-to-video. The image is used as the starting frame. |
| 2–3 | Reference-guided. Images are passed as ASSET reference images; the prompt describes the motion and content. |
Never pass more than 3 --input-image flags — the API rejects it.
Default model: veo-3.1-fast-generate-preview.
Use the following mapping when the user explicitly names a model variant:
| User says | Use |
|---|---|
veo 3.1, 3.1 fast, or no preference | veo-3.1-fast-generate-preview (default) |
veo 3.1 full or veo 3.1 standard | veo-3.1-generate-preview |
veo 3, veo 3 fast | veo-3.0-fast-generate-001 |
veo 3 standard | veo-3.0-generate-001 |
veo 2 | veo-2.0-generate-001 |
Do not switch models from vague quality words alone. Only apply a mapping when the user's wording clearly refers to model choice.
Duration accepts only discrete values — not a range.
| Model | Valid durations |
|---|---|
veo-2.0-generate-001 | 5, 6, 7, 8 |
veo-3.0-* | 4, 6, 8 |
veo-3.1-* | 4, 6, 8 |
| Model | Supported resolutions |
|---|---|
veo-2.0-generate-001 | 720p only |
veo-3.0-* | 720p, 1080p |
veo-3.1-* | 720p, 1080p, 4k |
Constraints:
1080p requires --duration 8.4k requires --duration 8 and a veo-3.1 model.All models: 16:9 (landscape) and 9:16 (portrait).
Translate clear user intent into CLI flags when the mapping is obvious.
Aspect ratio guidance:
--aspect-ratio 9:16 for vertical/portrait outputs: phone wallpaper, short-form vertical video, story format.--aspect-ratio 16:9 for landscape outputs: film, presentation, widescreen. This is the default.Duration guidance:
--duration 4 (veo-3+) or --duration 5 (veo-2).Resolution guidance:
1080p) is appropriate for most requests.--resolution 4k only when the user explicitly asks for 4K quality and a veo-3.1 model is in use; pair with --duration 8.--resolution 720p when the user asks for a smaller or faster result.Output path guidance:
--output is omitted, yummycli generates a timestamped .mp4 filename in the current working directory. Do not invent your own filename unless the user provides one..mp4. Reject or correct any other extension.Video commands return JSON on stdout. Read the response and use the output field as the generated file path.
Example (text-to-video):
{
"provider": "gemini",
"output": "veo_20260417_142301_047.mp4",
"model": "veo-3.1-fast-generate-preview",
"duration_seconds": 8,
"aspect_ratio": "16:9",
"resolution": "1080p",
"elapsed_seconds": 73
}
Example (image-to-video, one starting frame):
{
"provider": "gemini",
"output": "veo_20260417_143010_112.mp4",
"model": "veo-3.1-fast-generate-preview",
"duration_seconds": 8,
"aspect_ratio": "16:9",
"resolution": "1080p",
"elapsed_seconds": 89,
"input_images": ["./dog.jpg"]
}
yummycli auth status --provider gemini before running if credentials may not be configured.--input-image flag per local image file; preserve the user-specified order.output path back to the user after a successful run.Text-to-video:
yummycli gemini veo \
--prompt "A golden retriever puppy chasing a red ball in a sunny park"
Image-to-video (animate a still):
yummycli gemini veo \
--prompt "The dog starts running toward the camera" \
--input-image ./dog.jpg
Reference-guided (two images):
yummycli gemini veo \
--prompt "Combine the character from the first image with the environment from the second" \
--input-image ./character.png \
--input-image ./background.jpg
Short portrait clip with veo-2:
yummycli gemini veo \
--prompt "Falling cherry blossoms in slow motion" \
--model veo-2.0-generate-001 \
--aspect-ratio 9:16 \
--duration 5 \
--resolution 720p
4K landscape with veo-3.1:
yummycli gemini veo \
--prompt "Timelapse of clouds moving over mountain peaks at golden hour" \
--resolution 4k \
--duration 8