Install
openclaw skills install vpick-ai-video-creatorAll-in-one AI video production studio on a visual canvas. Generate videos (Kling 3.0, Veo 3.1, Sora 2, Runway, Grok, Midjourney Video), generate images (nano-banana, Midjourney, Grok, Seedream), add voiceover (ElevenLabs TTS), generate music (Suno), lip-sync faces (Kling AI Avatar), separate vocals (Demucs), change voices (ElevenLabs STS), combine clips with audio — all in one workflow. Use when the user says 'create a video', 'generate video', 'make a short film', 'AI video', 'video production', 'MultiShot', 'add voiceover', 'lip sync', 'generate music', 'combine videos', or wants end-to-end AI video creation.
openclaw skills install vpick-ai-video-creatorAll-in-one AI video production studio — from image generation to video creation, voiceover, music, lip-sync, and final export — all on a visual canvas. Powered by VPick.
https://vpick.10xboost.org/mcp/t/xxxxx...). Treat it like a password — do not share it publicly.vpick.10xboost.org on Google Cloud). VPick routes requests to third-party AI model providers (Kling, Veo, Runway, Sora, ElevenLabs, Suno) on your behalf. Your prompts and uploaded media are sent to these providers for processing.| Model | Duration | Sound | Cost | MultiShot | Best For |
|---|---|---|---|---|---|
| Kling 3.0 Standard | 3-10s | Yes | ~$0.30/sec | Yes | MultiShot scenes, character consistency |
| Kling 3.0 Professional | 3-10s | Yes | ~$0.405/sec | Yes | Higher quality MultiShot |
| Veo 3.1 Fast | 8s fixed | Yes | $0.90/video | No | Quick high-quality clips |
| Sora 2 | 10-15s | Yes | $0.525-$0.60 | No | Creative, artistic videos |
| Runway 720p/1080p | 5-10s | No | $0.18-$0.45 | No | Fast iteration |
| Grok Imagine | 6-15s | Yes | $0.15-$0.60 | No | Budget-friendly with audio |
| Midjourney Video | 5s | No | $0.90 | No | Stylized, artistic clips |
| Model | Cost | Output | Best For |
|---|---|---|---|
| nano-banana-2 | $0.16/image | 1 image | Default, fast, multi-reference |
| Midjourney (relaxed/fast/turbo) | $0.045-$0.24/grid | 4 images | Artistic, stylized |
| Grok Imagine | $0.06/call | 6 images | Bulk, budget |
| Seedream 5.0 (Lite/HD) | $0.0825/image | 1 image (2K-3K) | High resolution |
| Model | Type | Cost | Features |
|---|---|---|---|
| ElevenLabs V3 | Text-to-Speech | $0.21/1000 chars | 29+ voices, multi-language, stability control |
| Suno V4.5 | Music Generation | $0.10/song | Custom style, instrumental toggle, vocal gender |
| Kling AI Avatar | Lip Sync | $0.12/sec | Face animation from image + audio |
| Demucs | Vocal Separation | $0.30/call | Isolate vocals/accompaniment from audio |
| ElevenLabs STS v2 | Voice Changer | Free (user API key) | Speech-to-speech, noise removal |
VPick covers the entire video production workflow in one place:
Image Gen → Video Gen → Voiceover/Music → Lip Sync → Vocal/Voice Edit → Combine & Export
Start by understanding the user's project:
get_canvas to see the current statelist_projects to check existing projectscreate_project if starting freshCreate input nodes on the canvas:
Text prompts:
add_node(type: "text", name: "Scene 1 Prompt", data: { content: "A samurai walking through rain, cinematic lighting" })
Reference images (for start/end frames or character consistency):
upload_image(url: "https://example.com/character.jpg")
Generate images first if needed:
run_image_generator(nodeId: "<image_node_id>", prompt: "samurai portrait, white background", model: "nano-banana-2")
add_node(type: "video-generator", name: "Scene 1")
connect_nodes(sourceId: "<prompt_node>", targetId: "<video_node>", sourceHandle: "text-out", targetHandle: "prompt-in")
connect_nodes(sourceId: "<image_node>", targetId: "<video_node>", sourceHandle: "image-out", targetHandle: "start-image-in")
run_video_generator(nodeId: "<video_node>", model: "kling-3.0", duration: 5, sound: true)
MultiShot generates 3-15 seconds of video with multiple camera angles and character consistency in a single API call.
run_video_generator(
nodeId: "<video_node>",
model: "kling-3.0",
multiShot: true,
multiPrompt: [
{ "prompt": "@character walks into frame, wide shot", "duration": 4 },
{ "prompt": "@character looks at camera, medium close-up", "duration": 3 },
{ "prompt": "@character turns away, slow dolly out", "duration": 3 }
],
elements: [
{
"name": "character",
"description": "Main protagonist, male samurai",
"imageUrls": ["https://.../char-front.jpg", "https://.../char-side.jpg"]
}
],
sound: true
)
MultiShot Rules:
name must exactly match @name in prompts (case-sensitive)Audio is a core part of video production. VPick supports 5 audio tools:
Generate natural narration or dialogue from text. 29+ built-in voices, multi-language support.
add_node(type: "audio-generator", name: "Narration")
run_audio_generator(
nodeId: "<audio_node>",
prompt: "The samurai stood alone in the rain, waiting for dawn.",
model: "elevenlabs",
voiceId: "<voice_id>",
stability: 0.5
)
You can connect a Text node as input:
connect_nodes(sourceId: "<text_node>", targetId: "<audio_node>", sourceHandle: "text-out", targetHandle: "text-in")
Create original background music, theme songs, or jingles.
add_node(type: "music-generator", name: "BGM")
run_music_generator(
nodeId: "<music_node>",
prompt: "epic cinematic orchestral, tension building, dark atmosphere",
model: "suno",
instrumental: true,
style: "cinematic orchestral"
)
Set instrumental: false to include AI-generated vocals with lyrics from the prompt.
Animate a character's face to speak with any audio. Turns a still image into a talking head video.
add_node(type: "lipsync-generator", name: "Talking Character")
connect_nodes(sourceId: "<face_image>", targetId: "<lipsync_node>", sourceHandle: "image-out", targetHandle: "image-in")
connect_nodes(sourceId: "<audio_node>", targetId: "<lipsync_node>", sourceHandle: "audio-out", targetHandle: "audio-in")
run_lipsync_generator(nodeId: "<lipsync_node>")
Cost: ~$0.12/sec. Great for dialogue scenes, explainer videos, or virtual presenters.
Isolate vocals from background music in any audio/video file. Outputs: vocals track, accompaniment track, and original.
add_node(type: "vocal-separator", name: "Separate Audio")
connect_nodes(sourceId: "<video_or_audio>", targetId: "<separator_node>", ...)
run_vocal_separator(nodeId: "<separator_node>")
Use cases: Extract dialogue from a scene, remove background music, remix audio.
Transform any voice recording into a different voice while preserving speech patterns and emotion.
add_node(type: "voice-changer", name: "New Voice")
connect_nodes(sourceId: "<original_audio>", targetId: "<voice_changer_node>", sourceHandle: "audio-out", targetHandle: "audio-in")
run_voice_changer(nodeId: "<voice_changer_node>", voiceId: "<target_voice_id>", removeBackgroundNoise: true)
Requires user's own ElevenLabs API key (free, no credit charge).
Combine multiple audio tracks (e.g., voiceover + BGM) into one:
add_node(type: "audio-combine", name: "Mixed Audio")
connect_nodes(sourceId: "<voiceover>", targetId: "<mix_node>", sourceHandle: "audio-out", targetHandle: "audio-in")
connect_nodes(sourceId: "<bgm>", targetId: "<mix_node>", sourceHandle: "audio-out", targetHandle: "audio-in")
run_audio_combine(nodeId: "<mix_node>")
Supports up to 10 audio inputs.
Combine multiple video clips:
add_node(type: "combine", name: "Final Video")
connect_nodes(sourceId: "<video_1>", targetId: "<combine_node>", sourceHandle: "video-out", targetHandle: "videos-in")
connect_nodes(sourceId: "<video_2>", targetId: "<combine_node>", sourceHandle: "video-out", targetHandle: "videos-in")
connect_nodes(sourceId: "<bgm_node>", targetId: "<combine_node>", sourceHandle: "audio-out", targetHandle: "audio-in")
run_combine(nodeId: "<combine_node>")
Mix audio tracks:
run_audio_combine(nodeId: "<audio_combine_node>")
Keep the canvas clean:
auto_layout(nodeIds: ["<id1>", "<id2>", ...], direction: "horizontal", spacing: 200)
create_group(nodeIds: ["<id1>", "<id2>", ...], overrides: { label: "Scene 1", color: "#4A90D9" })
For repeatable processes, create workflows:
create_workflow(nodes: [...], edges: [...])
run_workflow(workflowId: "<id>")
list_nodes — See all nodes with their generation status and output URLsget_node(id) — Get specific node details including generated video/audio URLslist_generated_files(limit: 10) — Recent generation historyget_generation_stats — Usage breakdown by model and costget_generation_stats to see spending@name matches exactly| Error | Solution |
|---|---|
| Generation timeout | Auto-retries up to 2 times; check node status with get_node |
| Insufficient credits | Prompt user to top up at vpick.10xboost.org |
| Element name mismatch | Verify @name in prompts matches element name exactly |
| Invalid media format | Videos: MP4 recommended; Images: JPG/PNG |
| Node not found | Use list_nodes to get current node IDs |
list_models tool for current pricing and capabilities