Install
openclaw skills install kling-videoGenerate, animate, and edit AI videos using Kuaishou's Kling 3.0 and Kling Video O3 — featuring cinematic motion quality, physics simulation, reference-based generation, and natural-language video editing. Supports text-to-video, image-to-video, reference-to-video, and video editing in Pro and Standard tiers, up to 1080p resolution, 3-15 second duration, with optional synchronized sound generation. Available via Atlas Cloud API at 15% off standard pricing. Use this skill whenever the user wants to generate AI videos, create video clips, animate images, edit existing videos, produce short films, make video content, or mentions Kling, Kuaishou video, KwaiVGI, or video generation/editing. Also trigger when users ask to create product demos, marketing videos, social media reels, animated scenes, cinematic clips, talking head videos, edit video content, remove objects from video, change video backgrounds, or any video content using AI.
openclaw skills install kling-videoGenerate, animate, and edit AI videos using Kuaishou's Kling 3.0 and Kling Video O3 — featuring cinematic motion quality, realistic physics simulation, reference-based generation, and natural-language video editing.
Kling 3.0 excels at creating cinematic short clips with realistic motion, complex camera movements, and faithful prompt adherence. Kling Video O3 adds MVL (Multi-modal Visual Language) technology with reference-based generation and video editing capabilities. All models support optional synchronized sound generation.
Data usage note: This skill sends text prompts, image URLs, and video URLs to the Atlas Cloud API (
api.atlascloud.ai) for video generation and editing. No data is stored locally beyond the downloaded output files. API usage incurs charges per second based on the model selected.
export ATLASCLOUD_API_KEY="your-key"This skill includes a Python script for video generation. Zero external dependencies required.
python scripts/generate_video.py list-models
python scripts/generate_video.py generate \
--model "MODEL_ID" \
--prompt "Your prompt here" \
--output ./output \
duration=5 resolution=720p
python scripts/generate_video.py generate \
--model "MODEL_ID" \
--image "https://example.com/photo.jpg" \
--prompt "Animate this scene" \
--output ./output
python scripts/generate_video.py upload ./local-file.jpg
Run python scripts/generate_video.py generate --help for all options. Extra model params can be passed as key=value (e.g. duration=10 shot_type=multi_camera).
All prices are per second of video generated. Atlas Cloud offers 15% off compared to standard API pricing.
| Model | Tier | Original Price | Atlas Cloud | Best For |
|---|---|---|---|---|
kwaivgi/kling-v3.0-std/text-to-video | Standard | $0.153/s | Cost-effective text-to-video | |
kwaivgi/kling-v3.0-std/image-to-video | Standard | $0.153/s | Cost-effective image animation | |
kwaivgi/kling-v3.0-pro/text-to-video | Pro | $0.204/s | High-quality text-to-video | |
kwaivgi/kling-v3.0-pro/image-to-video | Pro | $0.204/s | High-quality image animation |
| Model | Original Price | Atlas Cloud | Best For |
|---|---|---|---|
kwaivgi/kling-video-o3-pro/text-to-video | $0.204/s | MVL-enhanced text-to-video | |
kwaivgi/kling-video-o3-pro/image-to-video | $0.204/s | MVL-enhanced image animation | |
kwaivgi/kling-video-o3-pro/reference-to-video | $0.204/s | Reference-based video generation | |
kwaivgi/kling-video-o3-pro/video-edit | $0.306/s | Professional video editing |
| Model | Original Price | Atlas Cloud | Best For |
|---|---|---|---|
kwaivgi/kling-video-o3-std/text-to-video | - | $0.153/s | Cost-effective MVL text-to-video |
kwaivgi/kling-video-o3-std/image-to-video | - | $0.153/s | Cost-effective MVL image animation |
kwaivgi/kling-video-o3-std/reference-to-video | - | $0.085/s | Cost-effective reference-based generation |
kwaivgi/kling-video-o3-std/video-edit | - | $0.238/s | Budget video editing |
| Parameter | Type | Required | Default | Options |
|---|---|---|---|---|
prompt | string | Yes | - | Video description |
negative_prompt | string | No | - | What to exclude from the video |
duration | integer | No | 5 | 5, 10 seconds |
aspect_ratio | string | No | 16:9 | 16:9, 9:16, 1:1 |
cfg_scale | number | No | 0.5 | 0-1, controls prompt adherence |
sound | boolean | No | false | Generate synchronized audio |
Same as V3.0 text-to-video, plus:
| Parameter | Type | Required | Description |
|---|---|---|---|
image | string | Yes | URL of the source image (jpg/jpeg/png, max 10MB, min 300px, aspect ratio 1:2.5 to 2.5:1) |
end_image | string | No | URL of the target end frame (for guided motion) |
| Parameter | Type | Required | Default | Options |
|---|---|---|---|---|
prompt | string | Yes | - | Video description |
aspect_ratio | string | No | 16:9 | 16:9, 9:16, 1:1 |
duration | integer | No | 5 | 3-15 seconds |
sound | boolean | No | false | Generate synchronized audio |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Video description |
image | string | Yes | - | First frame image URL |
end_image | string | No | - | Last frame image URL |
duration | integer | No | 5 | 3-15 seconds |
generate_audio | boolean | No | false | Auto-add audio to video |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Video description |
images | array | No | - | Reference images (up to 7 without video, up to 4 with video) |
video | string | No | - | Reference video URL |
keep_original_sound | boolean | No | true | Keep original sound from reference video |
sound | boolean | No | false | Generate new audio |
aspect_ratio | string | No | 16:9 | 16:9, 9:16, 1:1 |
duration | integer | No | 5 | 3-15 seconds |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Editing instruction in natural language |
video | string | Yes | - | Source video URL (max 10s duration) |
images | array | No | - | Reference images for element, scene, or style (max 4) |
keep_original_sound | boolean | No | true | Keep original audio from the video |
# Step 1: Submit
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kwaivgi/kling-v3.0-pro/text-to-video",
"prompt": "A golden retriever running through a sunlit meadow, camera tracking alongside, wildflowers swaying in the breeze",
"aspect_ratio": "16:9",
"duration": 5,
"cfg_scale": 0.5,
"sound": true
}'
# Returns: { "code": 200, "data": { "id": "prediction-id" } }
# Step 2: Poll (every 5 seconds until "completed" or "succeeded")
curl -s "https://api.atlascloud.ai/api/v1/model/prediction/{prediction-id}" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY"
# Returns: { "code": 200, "data": { "status": "completed", "outputs": ["https://...video-url..."] } }
# Step 3: Download
curl -o output.mp4 "VIDEO_URL_FROM_OUTPUTS"
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kwaivgi/kling-v3.0-pro/image-to-video",
"image": "https://example.com/landscape.jpg",
"prompt": "The camera slowly pans across the landscape as clouds drift by and trees sway gently",
"aspect_ratio": "16:9",
"duration": 5,
"sound": false
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kwaivgi/kling-video-o3-pro/reference-to-video",
"prompt": "A young woman walks through a cherry blossom garden, camera follows from behind",
"images": ["https://example.com/character-ref.jpg"],
"aspect_ratio": "16:9",
"duration": 5,
"sound": false
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kwaivgi/kling-video-o3-pro/video-edit",
"video": "https://example.com/original-video.mp4",
"prompt": "Remove the person in the background and replace with a blooming cherry tree",
"keep_original_sound": true
}'
curl -s -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
-H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kwaivgi/kling-v3.0-std/text-to-video",
"prompt": "Ocean waves crashing on a rocky shore at sunset, seagulls flying overhead",
"aspect_ratio": "16:9",
"duration": 5,
"cfg_scale": 0.5
}'
processing / starting / running → wait 5s, retry (typically takes ~60-120s)completed / succeeded → done, get URL from data.outputs[]failed → error, read data.errorIf the Atlas Cloud MCP server is configured, use built-in tools:
atlas_generate_video(model="kwaivgi/kling-v3.0-pro/text-to-video", params={...})
atlas_get_prediction(prediction_id="...")
Determine task type:
Choose model family:
Choose tier:
Extract parameters:
Execute: POST to generateVideo API → poll result → download MP4
Present result: show file path, offer to play
Kling produces best results with detailed, descriptive prompts:
When using image-to-video models, the source image must meet these requirements: