Install
openclaw skills install @pushpendrachauhan/flow-image-genGenerate the storyboard images for a short-form video job. Walks the image_prompts[] array from a job's input.json, calls Google's Gemini image model to render each prompt as a PNG, and saves files into the job's images/ folder using the filenames specified by the timeline. Up to 4 images in parallel. Use whenever the orchestrator hands off image generation.
openclaw skills install @pushpendrachauhan/flow-image-genGenerate all storyboard images for one short-form video job using Google's
Gemini image model. Self-contained — one curl per image, no external services
beyond the Gemini API.
A job folder path, e.g. examples/demo-job/. Inside it, input.json with:
image_prompts[] — array with id, prompt, optional negative_prompttimeline[] — used to map each prompt id to its output filename (.image)resolution — like "1080x1920" — used to derive aspect ratioimage_style — object with consistency_anchor and color_grade (both appended to every prompt) for visual consistency across the setimage_style is read from input.json and merged into each prompt automatically.
The bundled runner implements the whole loop (parallelism, retries, skip-existing, PNG size check). Run it as one invocation:
JOB=examples/demo-job bash skills/flow-image-gen/scripts/gen_images.sh
Tune concurrency with IMAGE_GEN_PARALLEL (default 4). The steps below document
exactly what that script does, in case you want to drive it by hand.
This skill reads and writes <job_folder>/status.json to support idempotent re-runs.
STATUS_FILE="$JOB/status.json"
# Initialize if absent (one-time, first skill to run)
if [ ! -f "$STATUS_FILE" ]; then
jq -n '{
schema_version: 1,
stages: {images: "pending", voiceover: "pending", render: "pending"},
artifacts: {images_completed: 0, voiceover_duration_ms: null, output_path: null},
errors: []
}' > "$STATUS_FILE"
fi
STAGE_STATUS=$(jq -r '.stages.images // "pending"' "$STATUS_FILE")
if [ "$STAGE_STATUS" = "done" ]; then
echo "Skipped (images stage already done)"
exit 0
elif [ "$STAGE_STATUS" = "failed" ]; then
echo "FAILED: images stage previously failed. Check status.json. Exiting." >&2
exit 1
fi
# Mark running
jq '.stages.images = "running"' "$STATUS_FILE" > "${STATUS_FILE}.tmp" && mv "${STATUS_FILE}.tmp" "$STATUS_FILE"
IMAGES_COUNT=$(ls -1 "$JOB/images/"*.png 2>/dev/null | wc -l)
jq --argjson count "$IMAGES_COUNT" \
'.stages.images = "done" | .artifacts.images_completed = $count' \
"$STATUS_FILE" > "${STATUS_FILE}.tmp" && mv "${STATUS_FILE}.tmp" "$STATUS_FILE"
jq --arg msg "<short error description>" \
'.stages.images = "failed" | .errors += [{"stage": "images", "message": $msg, "time": (now | strftime("%Y-%m-%dT%H:%M:%SZ"))}]' \
"$STATUS_FILE" > "${STATUS_FILE}.tmp" && mv "${STATUS_FILE}.tmp" "$STATUS_FILE"
https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContentx-goog-api-key: $GEMINI_API_KEY (NOT Authorization: Bearer — Google rejects that with ACCESS_TOKEN_TYPE_UNSUPPORTED)candidates[0].content.parts[].inlineData.dataConfirm GEMINI_API_KEY is set. If empty, exit with error.
Read <job_folder>/input.json. Parse image_prompts[], timeline[], resolution, image_style.
Derive aspect ratio from resolution:
"9:16" (vertical Shorts)"16:9" (landscape)"1:1" (square)Ensure <job_folder>/images/ exists.
Extract image_style.consistency_anchor and image_style.color_grade. These two strings are appended to every prompt for visual consistency.
Build the work list: for each image_prompts[i], the destination filename comes from the matching timeline[].image (matched by id, else positional, else <id>.png).
Generate the images — up to 4 in parallel, never sequentially one curl at a time (a 14-image job drops from ~4 min to ~1 min of waiting). Per image:
a. Build the destination path: $JOB/images/<filename>.
b. Skip if it already exists and is non-empty (supports re-runs after partial failure).
c. Build the request body, merging the style anchor + color grade into the prompt:
REQ_FILE=$(mktemp)
FULL_PROMPT="${PROMPT}. ${STYLE_ANCHOR} Color grade: ${COLOR_GRADE}."
jq -n --arg p "$FULL_PROMPT" --arg ar "$ASPECT_RATIO" '{
contents: [{parts: [{text: $p}]}],
generationConfig: {responseModalities: ["IMAGE"], imageConfig: {aspectRatio: $ar}}
}' > "$REQ_FILE"
d. Submit the request:
RESP_FILE=$(mktemp)
HTTP_CODE=$(curl -sS -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d @"$REQ_FILE" -w "%{http_code}" -o "$RESP_FILE")
e. Error check. If HTTP code is not 200, OR the response JSON has an .error key, report and retry once, then fail loudly.
f. Decode the base64 image and write to disk:
jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' "$RESP_FILE" \
| base64 -d > "$DEST"
g. Verify the file is a valid PNG and non-trivially sized (>= 10 KB).
h. Clean up temp files; print OK $DEST.
Generated N/M images. and exit non-zero if any image failed.responseModalities: ["IMAGE"] field is required. Without it, the same model returns a text description of the image instead of the image bytes.negative_prompt from image_prompts[] and image_style.negative_prompt are not used as a native negative parameter (this model doesn't support one). If exclusion matters, append them to the main prompt as natural language: "no text overlay, no watermark.""1:1", "3:4", "4:3", "9:16", "16:9". Anything else 400s.ACCESS_TOKEN_TYPE_UNSUPPORTED, the auth header is wrong (must be x-goog-api-key, not Authorization: Bearer).RESOURCE_EXHAUSTED with FreeTier quotas, billing isn't active on the project that owns the key.Per image: OK <abs-path> to stdout.
Final summary line: Generated N/M images.
On failure: FAILED <path> (id=<n>): <reason> to stderr, exit non-zero.