Install
openclaw skills install runpod-mediaGenerate images from text, edit images with text instructions, animate images to video, and generate video from text — all via RunPod public AI endpoints. Us...
openclaw skills install runpod-mediaGenerate AI images and videos using RunPod public endpoints. All output is saved to ~/runpod-media/.
One key required — add to ~/.openclaw/secrets.json:
| Key path | Purpose | Get it from |
|---|---|---|
/runpod/apiKey | Call RunPod endpoints | runpod.io/console/user/settings |
Local images are uploaded to Cloudflare R2 as presigned URLs (1 min expiry) before being sent to RunPod endpoints. R2 credentials are read from /cloudflare/r2 in secrets.json — already configured ✅
imgbb is no longer used. R2 presigned URLs replace it for all local file uploads.
R2 cleanup: Objects in uploads/ are auto-deleted after 1 day via a lifecycle rule on the openclaw bucket. Presigned URLs expire after 1 min (no access), objects are cleaned up within 24h.
Keys are resolved in this order:
~/.openclaw/secrets.json ✅ (already configured)RUNPOD_API_KEYThe user will never type CLI commands — translate their natural requests into the right script call.
Generate an image:
generate_image --prompt "..."generate_image --prompt "..." --aspect-ratio 16:9call_endpoint --endpoint google-nano-banana-2-edit --prompt "..."Edit an image:
edit_image --images <file> --prompt "add snow falling"call_endpoint --endpoint qwen-image-edit --image <file> --prompt "make it look like a painting"Animate to video:
image_to_video --image <file> --prompt "slow camera pan"image_to_video --image <file> --model kling --prompt "..."call_endpoint --endpoint sora-2-pro-i2v --image <file> --prompt "..." --duration 10Text to video:
text_to_video --prompt "..."List available models:
list_endpoints and summarize the output in plain language for the userAdd a new endpoint:
discover_endpoints add --candidates "<url-or-id>"| Task | Command | Cost | Time |
|---|---|---|---|
| Text → Image | generate_image | ~$0.005/image | 3–8s |
| Edit image(s) | edit_image | ~$0.005/image | 5–15s |
| Image → Video | image_to_video | $0.03–$0.90/clip | 30–120s |
| Text → Video | text_to_video | $0.04–$1.22/clip | 30–120s |
| Any endpoint | call_endpoint | varies | varies |
The built-in commands use default endpoints. For more models (Nano Banana Pro, FLUX, Sora 2, Kling, TTS, etc.) use call_endpoint with any RunPod public endpoint ID.
All known public endpoints are in scripts/endpoints.json. List them:
$SKILL_DIR/run.sh list_endpoints
$SKILL_DIR/run.sh call_endpoint \
--endpoint <ENDPOINT_ID> \
[--prompt "TEXT"] \
[--image PATH_OR_URL] \
[--audio PATH_OR_URL] \
[--duration 5] \
[--aspect-ratio 16:9] \
[--input '{"key": "value"}'] # full JSON override
Examples:
# Nano Banana Pro image generation
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "a golden retriever in space"
# Nano Banana Pro image editing
$SKILL_DIR/run.sh call_endpoint --endpoint nano-banana-pro --prompt "make it nighttime" --image photo.jpg
# Sora 2 Pro video from image
$SKILL_DIR/run.sh call_endpoint --endpoint sora-2-pro-i2v --image photo.jpg --prompt "camera slowly pulls back" --duration 5
# Kokoro TTS
$SKILL_DIR/run.sh call_endpoint --endpoint kokoro-tts --text "Hello world"
# FLUX Schnell
$SKILL_DIR/run.sh call_endpoint --endpoint flux-schnell --prompt "cyberpunk city" --input '{"width":1024,"height":1024}'
When the user asks to use an endpoint not in the registry, or the runpod skill reveals a new one:
--endpoint <id> — no registry entry neededscripts/endpoints.json for future sessionsWith runpod skill: Use the runpod skill to browse/discover endpoint IDs on the RunPod hub, then pass that ID to call_endpoint here.
$SKILL_DIR/run.sh generate_image \
--prompt "PROMPT" \
[--aspect-ratio 1:1|16:9|9:16|4:3|3:4] \
[--seed 42]
$SKILL_DIR/run.sh edit_image \
--images PATH_OR_URL [PATH_OR_URL ...] \
--prompt "EDIT INSTRUCTION" \
[--aspect-ratio 1:1] \
[--seed 42]
/imgbb/apiKey in secrets.json)$SKILL_DIR/run.sh image_to_video \
--image PATH_OR_URL \
--prompt "MOTION DESCRIPTION" \
[--model wan25|kling|seedance] \
[--duration 5|10] \
[--negative-prompt "TEXT"]
Models:
wan25 (default) — WAN 2.5, ~$0.026/5skling — Kling v2.1 Pro, $0.45/5s (highest quality)seedance — Seedance 1.0 Pro, ~$0.12/5s$SKILL_DIR/run.sh text_to_video \
--prompt "VIDEO DESCRIPTION" \
[--model wan26|seedance] \
[--duration 5|10|15] \
[--size 1920x1080] \
[--negative-prompt "TEXT"]
Models:
wan26 (default) — WAN 2.6, ~$0.04/5sseedance — Seedance 1.0 Pro, ~$0.12/5s🦊 Fox under the aurora not 🦊 Fox — 105s render (~$0.026).After generating an image or video, always deliver it to the user via their active channel.
The message tool with a local media path may fail in sandboxed agent modes due to SecretRef resolution not being available for media sends. This is a known OpenClaw limitation.
Read the bot token from secrets and send via curl — this always works regardless of sandbox mode:
TOKEN=$(cat ~/.openclaw/secrets.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('telegram',{}).get('botToken',''))")
# Send photo
curl -s \
-F "chat_id=CHAT_ID" \
-F "photo=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.jpg" \
-F "caption=YOUR CAPTION" \
"https://api.telegram.org/bot${TOKEN}/sendPhoto"
# Send video (.mp4)
curl -s \
-F "chat_id=CHAT_ID" \
-F "video=@$HOME/.openclaw/workspace/runpod-media/OUTPUT_FILE.mp4" \
-F "caption=YOUR CAPTION" \
"https://api.telegram.org/bot${TOKEN}/sendVideo"
Where to find CHAT_ID: Use the chat_id from the inbound message metadata (e.g. telegram:1231438498 → use 1231438498).
message tool with a short, natural caption (no cost/time unless asked)rm <path>--keepmessage tool?Try it first — if it works, great. If it returns a SecretRef error, fall back to the curl method above.
~/.openclaw/workspace/runpod-media/ — accessible in both sandboxed and elevated agent modesscripts/_utils.py — do not call directly