Install
openclaw skills install kling-3-0Kling 3.0 video generation on RunComfy. Kling 3.0 (also called Kling V3.0) is Kuaishou Technology's third-generation multi-shot video model with native synchronized audio and consistent character identity across shots. This skill covers all six Kling 3.0 endpoints, spanning three rendering tiers (Standard, Pro, 4K) and two modes (text-to-video, image-to-video). Calls runcomfy run kling/kling-3.0/<tier>/<mode> through the local RunComfy CLI. Triggers on "kling", "kling 3.0", "kling v3", "kling pro", "kling 4k", "kling text to video", "kling image to video", or any explicit ask to generate or animate with Kling 3.0.
openclaw skills install kling-3-0runcomfy.com · docs · GitHub
Kling 3.0 is Kuaishou Technology's third-generation cinematic video model. This skill covers all six Kling 3.0 rendering endpoints on RunComfy: three quality tiers (Standard, Pro, 4K) across two modes (text-to-video and image-to-video).
Kling 3.0 is the V3 generation of the Kling video model. It produces multi-shot cinematic video with synchronized native audio, consistent character identity across shots, and physics-aware motion. Compared to Kling 2.x, Kling 3.0 supports longer clips (up to 15 seconds), native 4K output on the 4K tier, and a unified multi-prompt segment system that lets one Kling 3.0 generation contain several distinct scenes with controlled transitions.
Kling 3.0 ships in three rendering tiers on RunComfy, each available as text-to-video or image-to-video:
All three tiers share the same Kling 3.0 multi-shot architecture. Tiers differ in resolution ceiling, motion-fidelity budget, and pricing.
Each endpoint corresponds to one (tier, mode) pair. All six endpoints share the same Kling 3.0 base model.
| Endpoint | Anchor | Resolution | Rate (no audio) | Rate (with audio) |
|---|---|---|---|---|
kling/kling-3.0/standard/text-to-video | Kling 3.0 Standard t2v | up to 1080p | $0.084/s | $0.126/s |
kling/kling-3.0/standard/image-to-video | Kling 3.0 Standard Image to Video | up to 1080p | $0.084/s | $0.126/s |
kling/kling-3.0/pro/text-to-video | Kling V3.0 Pro Text-to-Video | 1080p | $0.112/s | $0.168/s |
kling/kling-3.0/pro/image-to-video | Kling V3.0 Pro Image-to-Video | 1080p | $0.112/s | $0.168/s |
kling/kling-3.0/4k/text-to-video | Kling V3.0 4K Text-to-Video | 3840x2160 | $0.42/s flat | $0.42/s flat |
kling/kling-3.0/4k/image-to-video | Kling V3.0 4K Image-to-Video | 3840x2160 | $0.42/s flat | $0.42/s flat |
The 4K tier prices the same regardless of audio. Standard and Pro tiers charge ~50% more per second when audio is enabled.
Pick a Kling 3.0 tier based on the output's role in the pipeline.
Pick the mode based on whether you have a source image:
If the user explicitly asked for Kling 3.0, Kling V3.0, Kling Pro, or Kling 4K, route to this skill regardless.
npm i -g @runcomfy/cliruncomfy login opens a browser device-code flow.RUNCOMFY_TOKEN=<token> instead of runcomfy login.| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | - | Text description of scene, motion, camera, atmosphere. Multi-segment prompts supported via prompt_segments for scene transitions in one Kling 3.0 generation. |
image_url | string | yes (i2v only) | - | Source image for Kling 3.0 i2v. HTTPS URL. JPEG/PNG/WebP. |
tail_image_url | string | no (i2v only) | - | Optional ending image for controlled start-to-end frame transition on Kling 3.0 i2v. |
negative_prompt | string | no | - | Elements to exclude from the Kling 3.0 output. |
duration | int | no | 5 | 3-15 seconds per Kling 3.0 generation. |
aspect_ratio | enum | no | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9. |
cfg_scale | float | no | 0.5 | Prompt guidance strength. Higher = stricter adherence to prompt. |
generate_audio | bool | no | false | Enable Kling 3.0 in-pass synchronized audio. Adds cost on Standard and Pro tiers; flat-rate on 4K. |
seed | int | no | - | Reproducibility for Kling 3.0 variant testing. |
Kling 3.0 Standard text-to-video (cheapest 1080p draft):
runcomfy run kling/kling-3.0/standard/text-to-video \
--input '{
"prompt": "<Kling 3.0 prompt>",
"duration": 5,
"aspect_ratio": "16:9"
}' \
--output-dir <absolute/path>
Kling 3.0 Standard image-to-video (animate a still):
runcomfy run kling/kling-3.0/standard/image-to-video \
--input '{
"prompt": "<motion description for Kling 3.0 i2v>",
"image_url": "https://.../source.jpg",
"duration": 5
}' \
--output-dir <absolute/path>
Kling V3.0 Pro text-to-video (highest 1080p fidelity):
runcomfy run kling/kling-3.0/pro/text-to-video \
--input '{
"prompt": "<Kling 3.0 Pro prompt>",
"duration": 8,
"aspect_ratio": "16:9",
"generate_audio": true
}' \
--output-dir <absolute/path>
Kling V3.0 Pro image-to-video (hero animation from source image):
runcomfy run kling/kling-3.0/pro/image-to-video \
--input '{
"prompt": "<motion description for Kling V3.0 Pro i2v>",
"image_url": "https://.../subject.jpg",
"duration": 8,
"generate_audio": true
}' \
--output-dir <absolute/path>
Kling V3.0 4K text-to-video (native 4K cinematic):
runcomfy run kling/kling-3.0/4k/text-to-video \
--input '{
"prompt": "<Kling V3.0 4K prompt>",
"duration": 10,
"aspect_ratio": "16:9",
"generate_audio": true
}' \
--output-dir <absolute/path>
Kling V3.0 4K image-to-video (4K animation of a reference image):
runcomfy run kling/kling-3.0/4k/image-to-video \
--input '{
"prompt": "<motion description for Kling V3.0 4K i2v>",
"image_url": "https://.../source-4k.jpg",
"duration": 10,
"generate_audio": true
}' \
--output-dir <absolute/path>
The CLI submits the Kling 3.0 request, polls every 2s, fetches the result, and downloads any *.runcomfy.net / *.runcomfy.com URL into --output-dir.
Kling 3.0 responds to specific prompting patterns better than naive prose.
Lead with motion and camera language. Kling 3.0 reads "wide shot, slow push-in", "tracking shot, low angle", "handheld follow" as real directives. Front-load these.
Multi-shot in one Kling 3.0 generation. A single Kling 3.0 prompt can describe a sequence of shots. Number them: "Shot 1: wide of the cafe at dusk. Shot 2: medium close-up of the barista. Shot 3: tight on the espresso pour." Kling 3.0 will preserve identity (face, wardrobe, props) across the shots.
Identity anchors for i2v. When using Kling 3.0 i2v, restate what should remain stable: "preserve the subject's face, pose, and clothing; only the camera moves and the background changes."
tail_image_url for controlled endings. On Kling 3.0 i2v, supply a tail image to lock the final frame. Kling 3.0 will interpolate motion from source to tail.
generate_audio: true for one-pass dialogue. Describe what Kling 3.0 should produce in audio: "warm friendly tone, English voiceover" or "city ambience, distant traffic, no dialogue." Audio adds cost on Standard / Pro; flat on 4K.
cfg_scale tuning. Default 0.5 works for most Kling 3.0 prompts. Raise to 0.7-0.9 for strict prompt adherence on stylized output. Lower to 0.3-0.4 for natural motion when the prompt is loose.
Anti-patterns:
| Use case | Best Kling 3.0 endpoint |
|---|---|
| Cinematic 1080p brand stories with consistent characters | Kling V3.0 Pro (t2v or i2v) |
| Native 4K hero films and big-screen cinematic | Kling V3.0 4K (t2v or i2v) |
| Cheap iteration, social-first shorts, A/B variants | Kling 3.0 Standard t2v |
| Animating brand assets, product photos, character art | Kling 3.0 Standard i2v or Kling V3.0 Pro i2v |
| Multi-shot ads with synchronized dialogue in one pass | Kling V3.0 Pro with generate_audio: true |
| Premium 4K finished masters with native audio | Kling V3.0 4K with generate_audio: true (flat rate) |
Kling 3.0 cinematic multi-shot (Pro tier recommended):
Cinematic multi-shot of a young American couple celebrating their
anniversary at a candlelit rooftop restaurant. Shot 1: wide of the
city skyline at golden hour. Shot 2: medium two-shot, the couple
toasting. Shot 3: tight on the woman's smile, soft bokeh, warm fill
light. Subtle ambient string music, gentle wind, distant traffic.
Kling 3.0 i2v (animate a portrait, 4K tier):
Gentle camera dolly-in on the subject from the source image. Subtle
breathing motion, identity-stable features, soft natural light,
shallow depth of field. Background: warm golden-hour glow with a
slow drift of dust motes. No dialogue, only ambient room tone.
Kling 3.0 vertical short (Standard tier, 9:16):
9:16 vertical. A barista in a black apron pulls a single espresso
shot, steam rising into morning sun, rich crema slowly forming.
Close-up handheld, shallow depth of field, warm cafe ambience and
the hiss of the steam wand.
What is the maximum duration of a Kling 3.0 clip? 15 seconds per generation across all three tiers. For longer narratives, segment the script into multiple Kling 3.0 calls and stitch.
How is Kling V3.0 4K priced compared to Standard and Pro? Kling V3.0 4K is a flat $0.42 per second whether or not audio is enabled. Standard is $0.084/s without audio (cheapest). Pro is $0.112/s without audio. The 4K tier costs roughly 5x Standard for the resolution upgrade.
Does Kling 3.0 support multi-shot in a single generation? Yes. All Kling 3.0 endpoints accept multi-segment prompts. Number the shots ("Shot 1:", "Shot 2:", etc.) and Kling 3.0 will preserve character identity across them.
Can Kling 3.0 generate audio? Yes. Set generate_audio: true. Kling 3.0 produces synchronized dialogue, ambient sound, and music in the same generation pass. On 4K the price stays flat at $0.42/s; on Standard / Pro the rate jumps about 50% with audio.
What aspect ratios does Kling 3.0 support? 16:9, 9:16, 1:1, 4:3, 3:4, 21:9. The 4K tier renders 21:9 as wide cinema crops at native 3840x2160.
Does Kling 3.0 i2v support a tail image? Yes. tail_image_url locks the final frame; Kling 3.0 interpolates motion from source to tail.
How is Kling 3.0 different from Kling 2.x? Kling 3.0 has stronger multi-shot identity preservation, longer max duration (15s vs 10s on the 2.x flagship), native 4K on the 4K tier, and unified multi-prompt segment input across all tiers.
The runcomfy CLI uses sysexits-style codes:
| code | meaning |
|---|---|
| 0 | Kling 3.0 generation succeeded |
| 64 | bad CLI args |
| 65 | bad input JSON for Kling 3.0 / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
runcomfy run kling/kling-3.0/<tier>/<mode> with a JSON body matching the schema.request_id; the CLI polls every 2 seconds until the Kling 3.0 generation finishes..runcomfy.net / .runcomfy.com URL into --output-dir.Ctrl-C cancels the in-flight Kling 3.0 request before billing.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var in CI / containers.--input. The CLI does not shell-expand. No shell-injection surface.model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist).