ark-video-storyboard

v1.0.4

Generate a storyboard and prompts from a scene or reference images, confirm the script with the user, then optionally submit multi-segment video generation t...

⭐ 1· 223·0 current·0 all-time

byHeaven@yunni123

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yunni123/ark-video-storyboard.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "ark-video-storyboard" (yunni123/ark-video-storyboard) from ClawHub.
Skill page: https://clawhub.ai/yunni123/ark-video-storyboard
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ark-video-storyboard

ClawHub CLI

Package manager switcher

npx clawhub@latest install ark-video-storyboard

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill's name/description (generate a storyboard and optionally submit video generation to Volcengine Ark) matches the code and instructions: scripts submit tasks to an Ark API and download/merge results. However the registry metadata lists no required environment variables or binaries while the implementation clearly expects an API key (ARK_API_KEY or a key in ~/.openclaw/openclaw.json) and uses external tools (curl, ffmpeg/ffprobe). That discrepancy is unexpected and should be corrected or justified.

Instruction Scope

SKILL.md and scripts instruct the agent to: ask for/handle reference images, submit generation jobs to Ark, poll status, automatically download videos into ~/.openclaw/media/{timestamp}/, run ffmpeg to merge/compress, and record workflow info in ~/.openclaw/workspace/WORKFLOW.md. The skill also contains a hard rule to default all human characters to 'East Asian' unless specified, which is an ethical/behavioral policy issue and not a technical necessity. The instructions also reference sending via a 'message' tool to Feishu (external endpoint) even though no code implements that call — scope includes file I/O, network calls, and user-facing behavioral defaults that are not purely technical.

ℹ

Install Mechanism

There is no install spec (instruction-only), which reduces installer risk, but the package nevertheless includes runnable scripts that invoke curl, ffmpeg, and subprocesses. No external download URLs or archive extraction are present. Because the skill contains executable scripts, installing or running the skill will execute local network calls and file writes even without a separate install step.

Credentials

Registry metadata claims no required env vars, yet the code explicitly looks for ARK_API_KEY and falls back to keys stored in ~/.openclaw/openclaw.json under skills entries. The skill writes to ~/.openclaw/media and ~/.openclaw/workspace and expects external binaries (curl, ffmpeg). Requesting no credentials in metadata while the code reads and uses an API key is an inconsistency and increases risk of surprise credential usage.

ℹ

Persistence & Privilege

always:false (normal). The skill does not request permanent platform-wide inclusion, but it will create directories and files under the user's home (~/.openclaw/media and ~/.openclaw/workspace) and will run networked subprocesses if invoked. That level of filesystem/network access is reasonable for a video-generation workflow but should be understood by the user before enabling the skill.

What to consider before installing

Key things to check before installing or running this skill: - Credentials: Although the registry says no env vars are required, the code reads ARK_API_KEY (env) and also looks in ~/.openclaw/openclaw.json for a stored API key. Only provide an API key if you trust the Ark/Volcengine endpoint and the skill's behavior. - Binaries & tools: The scripts call curl and rely on ffmpeg/ffprobe for merging/compression. Ensure those binaries are present and come from trusted packages; the metadata doesn't declare them. - File writes: The skill will create and write files under ~/.openclaw/media/{timestamp}/ and append a record to ~/.openclaw/workspace/WORKFLOW.md. If you need to restrict filesystem side-effects, run the skill in a sandbox or edit the scripts to change output paths. - Network behavior: Submissions and polling call https://ark.cn-beijing.volces.com; downloads are done via curl. Review network calls if you have privacy or data-control concerns (reference images and prompts will be sent to the remote API). - Ethical/default behavior: The skill enforces a hard default that all human characters are East Asian unless the user specifies otherwise. If that behavior is unacceptable for your use case, modify SKILL.md/scripts or instruct the agent to ask the user explicitly instead of defaulting. - Attack surface: The package contains executable scripts that will run system commands (curl, ffmpeg). If you don't fully trust this skill, inspect the scripts yourself or run in an isolated environment. Consider limiting where API keys are stored (use ephemeral keys) and confirm there is no hidden endpoint beyond the Ark domain. If these mismatches or the default ethnicity rule bother you, ask the publisher to update the metadata to declare required env vars and binaries, remove or change the hard default ethnic rule, or provide a version with clearer permission/behavior controls.

Like a lobster shell, security has layers — review code before you run it.

latestvk976b7vgqnrg0tct4kkwarjj41837d50

223downloads

1stars

5versions

Updated 21h ago

v1.0.4

MIT-0

Ark Video Storyboard

Turn a scene idea into a structured video plan, then optionally execute it with the Ark video generation API.

This skill is confirmation-first:

First generate storyboard + prompts
Let the user review and revise
Only generate video after explicit user approval

Workflow

接收场景描述 — 用户描述视频场景（如"下班后去赛博朋克网吧打游戏"）
询问参考图 — 用户描述场景后，主动询问："你有参考图吗？"（图片用于风格/人物参考）
确认参考图角色 — 如果用户提供了参考图，询问："这张图是背景/风格参考还是人物形象参考？"
- 背景/风格参考：作为环境、色调、氛围的视觉基准
- 人物形象参考：作为主角外貌、着装、动作的基准
确认人物描述 — 如果有多个视频片段且没有人物参考图，主动询问用户："这个视频里主角的人物描述是什么？"（如"东亚男性、黑色短发、穿白色T恤"），收集后在每个段落提示词里保持完全一致
生成脚本 — 展开场景为更丰富的整体脚本，拆分为多个连贯段落
输出分镜 — 每个段落包含：参考图用途说明、人物描述（多段一致）、画面描述，光照状态、连贯性备注、英文 AI 提示词（含参考图风格描述+一致的人物描述）
用户确认 — 展示分镜给用户确认："这是不是你要的脚本/提示词？"
修改 — 用户如需调整（风格、节奏、镜头语言、人物细节、提示词措辞），修改后重新展示
执行确认 — 用户确认后，询问"是否开始生成视频？"
提交 API — 用户明确说"可以/开始生成"后，提交给 Ark API，逐段轮询结果，下载视频
合并并发送 — 所有片段下载完成后，用 ffmpeg 合并为一个完整视频，检查大小（飞书限制约 20MB），必要时压缩，通过飞书发送给用户

视频合并与发送流程

所有片段下载完成后，按以下步骤合并并发送给用户：

第一步：定位片段目录

Ark API 下载的视频片段默认保存在 ~/.openclaw/media/{timestamp}/，按时间戳组织。确认目录存在：

ls ~/.openclaw/media/{timestamp}/seg*.mp4

第二步：合并视频

创建片段列表文件

cd ~/.openclaw/media/{timestamp}/
echo "file 'seg1.mp4'\nfile 'seg2.mp4'\n..." > concat.txt

seg 序号与片段数量一致，逐行追加。

执行合并

ffmpeg -f concat -safe 0 -i concat.txt -c copy merged.mp4

验证

ls -lh merged.mp4
ffprobe -v quiet -print_format json -show_format merged.mp4

第三步：检查大小并压缩（如需要）

飞书直接发送限制约 20MB：

≤20MB：直接使用 merged.mp4
>20MB：压缩后再发

ffmpeg -i merged.mp4 \
  -c:v libx264 \
  -crf 28 \
  -c:a aac \
  -b:a 128k \
  -y merged_compressed.mp4

第四步：发送至飞书

使用 message 工具发送文件：

filePath: ~/.openclaw/media/{timestamp}/merged_compressed.mp4
channel: feishu
message: 告知用户视频已合并完成，共多少片段，时长多少

第五步：更新工作流记录

在 ~/.openclaw/workspace/WORKFLOW.md 中记录本次处理信息（时间戳、片段数量、输出文件路径、文件大小）。

人物一致性规则（关键）

如果视频有多个片段，且用户没有提供人物参考图，则：

在步骤4中主动询问人物描述
在每个段落的提示词里保持完全相同的人物描述（外貌、发型、着装等措辞必须一字不差）
人物描述格式示例：East Asian young man, black short hair, white T-shirt, 25 years old

如果用户提供了人物参考图，则每个提示词里统一写：consistent with the character in reference image

Interaction Phases

Phase 1: Script / Prompt Confirmation

User gives the scene, style, references, and goal.
If images are provided, first confirm whether each one is a background/environment reference or a character/subject reference.
If multiple segments and no character reference image, ask for a consistent character description.
Generate the storyboard, segment plan, and English prompts first.
Ask the user whether this version is correct.
If the user asks to tweak tone, pacing, camera language, subject details, prompt wording, or image-role interpretation, revise and show the updated version again.
Do not call the Ark API in this phase unless the user explicitly asks for direct generation.

Phase 2: Execution Confirmation

After the user confirms the script/prompt is correct, ask whether to start generation if they have not already made that explicit.
Only run the API submission / polling / download flow after explicit approval.
If submission fails, immediately report the exact stage and error.

Input Requirements

Collect as many of these as possible before writing prompts:

Reference image or images（主动询问用户是否有参考图）
Scene description
Subject or product
Target style (cinematic, cozy, commercial, dreamy, realistic, etc.)
Intended use (ad, social clip, atmosphere film, storytelling, product demo)
Constraints such as camera language, pacing, lighting, or ending mood
Total duration target
Segment count target
Consistent character description（多段无人物参考图时必须收集）

If inputs are incomplete, still proceed with reasonable defaults and clearly state the assumptions.

Hard Rules

Default all human characters to East Asian / 东方亚洲人 unless the user explicitly specifies otherwise.
All segments must belong to the same video, not unrelated clips.
Maintain continuity for character appearance, wardrobe, environment, props, lighting logic, and emotional progression.
Write the planning fields in Chinese unless the user requests another language.
Write the final generation prompts in English unless the user explicitly wants Chinese prompts.
Prefer cinematic, visual, action-oriented prompts over abstract descriptions.
Do not silently retry failed API submissions in the background without telling the user.

Segment Output Format

Follow the schema in references/storyboard-schema.md.

At minimum include:

Segment index
Duration seconds
Character description (same across all segments)
Visual description
Lighting state
Continuity notes
English AI prompt (includes consistent character + reference style)

Prompt Construction Rules

When writing a segment prompt, include the details that matter most for video generation:

Subject identity and appearance (consistent character description, same in every segment)
Camera angle or shot type
Motion or action
Scene and environment
Lighting and mood
Pacing or motion quality
Style words only when they improve consistency
Reference image style description (if provided by the user, e.g., "in cyberpunk neon city style per reference image")
For all segments of the same video: character description must be IDENTICAL

Sequence Design Rules

Use this narrative rhythm by default unless the user asks for a different structure:

Segment 1: establish subject, place, and mood
Segment 2: deepen action or environment interaction
Segment 3: push visual/emotional peak or transition
Final segment: resolve, land, or fade out with a clear ending image

For more than 4 segments, insert additional deepen / transition beats while preserving continuity.

Duration / Segment Logic

This skill should support dynamic segment splitting.

Examples:

60 seconds ÷ 4 segments = 15 seconds each
60 seconds ÷ 6 segments = 10 seconds each
60 seconds ÷ 12 segments = 5 seconds each

Current validated Seedance 1.5 Pro rule from user-confirmed testing:

duration must be an integer in the range [4, 12]

So before execution:

Compute duration = total_duration_seconds / segment_count
Ensure the result is an integer
Ensure the result is within 4~12
If not, stop and explain the issue to the user before submitting

API Execution

API key loading order for actual generation:

Explicit wrapper argument if one is added later
Environment variable ARK_API_KEY
~/.openclaw/openclaw.json → skills.entries.ark-video-storyboard.apiKey
Backward-compatible old format: skills.ark-video-storyboard.apiKey

If the user wants actual generation, read references/api.md and use the scripts:

scripts/build_storyboard.py to assemble structured segment data
scripts/run_full_generation.py to sequentially submit segments, poll each task, collect video_url, and optionally download videos
scripts/submit_segment.py to submit one segment at a time
scripts/get_task_result.py to query a task once and extract video_url
scripts/poll_task_until_done.py to poll until completion and return video_url
scripts/download_video.py to download a finished video_url to local storage

Submit segments sequentially, not in parallel, unless the user explicitly asks otherwise.

Current Known Ark Payload Requirements

Current known request requirements include:

model
content (first item is the text prompt)
ratio
duration
watermark
Reference image: use {"type": "image_url", "image_url": {"url": "<data_uri or url>"}} in content array

Current validated model in this workspace:

doubao-seedance-1-5-pro-251215

Error Handling Rule

If API submission fails, returns any model / parameter / schema error, or returns no valid task_id:

Stop immediately
Tell the user the exact failing segment and stage
Show the key error message
Explain which parameter or payload assumption most likely caused it
Do not pretend generation is still running
Do not continue to later segments

If API submission succeeds and returns a valid task_id:

Continue to the next segment by default without interrupting the user for each success
Do not notify the user for each successful segment submission
After all segments are successfully submitted, send one consolidated update that all tasks are in the Ark queue and generation is underway

Example Shape

A good segment should look like this:

人物描述：东亚男性，黑色短发，白色T恤（所有段落一致）
参考图用途：背景/风格参考（温馨卧室，城市夜景窗外，暖色灯光）
画面描述：描述主体、动作、构图，环境变化
光照状态：明确亮度，主光、轮廓光、氛围变化
AI 提示词：人物描述 + 镜头 + 动作 + 光线 + 情绪，提示词末尾加参考图风格描述

See references/examples.md for a concrete sleeping-scene example.

When To Read References

Read references/storyboard-schema.md before generating structured segments.
Read references/prompt-rules.md when you need guardrails for prompt quality or continuity.
Read references/api.md before building or submitting API payloads.
Read references/examples.md when the user wants output that matches the example style.

Comments

Loading comments...