Install
openclaw skills install ai-short-video-factoryAI Short Video Factory creates MP4 videos from HTML using HyperFrames. Use for captioned talking-head edits, product launches, data visualizations, code walkthroughs, social clips, GSAP animations, transitions, audio muxing, and deterministic rendering.
openclaw skills install ai-short-video-factoryAI 短视频工厂是一个基于 HyperFrames 的 AI 视频生成 Skill,可将文字创意、结构化脚本或内容大纲自动转化为 HTML + CSS + GSAP 动画视频,并通过 Headless Chrome + FFmpeg 确定性渲染为 MP4。无需 React,无按次收费,专为 AI Agent 设计。
核心能力:
一句话生成视频:
"帮我做一个 30 秒的科技资讯短视频,主题是 AI 编程工具的发展趋势"
AI 会自动完成全流程:
npx hyperframes initnpx hyperframes render最小示例:
# 创建项目
npx hyperframes init my-video --non-interactive
# 编辑 index.html(AI 自动生成内容)
# 渲染
npx hyperframes render --non-interactive
| 视频类型 | 典型时长 | 示例 |
|---|---|---|
| 科技资讯 / 行业速报 | 30-60s | AI 编程工具趋势、技术周报 |
| 产品发布 / 功能介绍 | 15-45s | 新功能 demo、产品亮点 |
| 数据可视化 | 20-40s | 市场规模、增长趋势、对比分析 |
| 教程 / 代码讲解 | 30-120s | 技术方案讲解、代码 walkthrough |
| 社交媒体短视频 | 15-30s | 抖音/小红书风格竖版视频 |
| 口播 + 字幕 | 任意 | 带背景去除、字幕同步的 talking-head |
| 品牌宣传 / 活动预热 | 15-45s | 会议宣传、品牌介绍 |
HyperFrames is an open-source (Apache 2.0) HTML-native video rendering framework by HeyGen. Write video as HTML + CSS + seekable animations, then render deterministically to MP4 via headless Chrome + FFmpeg. No React required, no per-render fees, designed for AI agents.
Before any operation, run the environment diagnostic:
npx hyperframes doctor
This reports all dependencies and their status. If any check fails, STOP and resolve before proceeding.
| Tool | Minimum version | Purpose | Install command |
|---|---|---|---|
| Node.js | 22+ | HyperFrames CLI runtime | WorkBuddy 自动管理;或 nvm install 22 / 官网安装 |
| FFmpeg | 5.0+ | Video encoding, audio muxing, frame extraction | macOS: brew install ffmpeg; Linux: apt install ffmpeg |
| FFprobe | (bundled with FFmpeg) | Media inspection, duration/format checks | Installed with FFmpeg |
| Chrome Headless Shell | Auto-managed | Frame-by-frame rendering engine | npx hyperframes browser ensure (auto-downloads) |
| HyperFrames CLI | 0.6.90+ | Composition management & rendering | npx hyperframes@latest (auto via npx) |
| Tool | Purpose | Install command |
|---|---|---|
| Docker | Containerized rendering (CI/CD, remote) | brew install --cask docker / Docker Desktop |
| Python 3.9+ | Optional audio analysis / BGM synthesis helpers | WorkBuddy 自动管理;或系统 Python 3.9+ |
For a brand new environment, execute these steps in order:
# 1. Verify Node.js (should be pre-installed by WorkBuddy)
node --version # must be >= 22.0.0
# 2. Install FFmpeg if missing
which ffmpeg || brew install ffmpeg
ffmpeg -version # confirm installation
# 3. Download Chrome Headless Shell (auto-cached at ~/.cache/hyperframes/chrome/)
npx hyperframes browser ensure
# 4. Run full diagnostic — all checks must pass
npx hyperframes doctor
| Symptom | Cause | Fix |
|---|---|---|
doctor shows ✗ FFmpeg | FFmpeg not installed | brew install ffmpeg |
doctor shows ✗ Chrome | First run, no cached browser | npx hyperframes browser ensure |
| Render hangs indefinitely | Chrome sandbox conflict in WorkBuddy | Use dangerouslyDisableSandbox: true on Bash tool |
EACCES permission error | npx cache permission issue | sudo chown -R $(whoami) ~/.npm |
| Render produces 0-byte MP4 | FFmpeg encoder issue | Check ffmpeg -encoders has libx264 and aac |
npx hyperframes not found | Node/npx not in PATH | 确保 Node.js 22+ 在 PATH 中;WorkBuddy 环境下使用托管 Node |
# Quick 3-command pre-flight (run at start of every video task)
npx hyperframes doctor # full diagnostic
which ffmpeg && ffmpeg -version # confirm FFmpeg accessible
echo "Pre-flight OK"
If doctor reports any failure, resolve it BEFORE writing any HTML. Do NOT proceed with a broken environment — it will waste render time and produce invalid output.
npx hyperframes init <project-name> --non-interactive
npx hyperframes init my-video --example blank
npx hyperframes init my-video --video clip.mp4 # with existing video
npx hyperframes init my-video --audio track.mp3 # with audio
npx hyperframes init my-video --tailwind # Tailwind v4 support
Available templates: blank, warm-grain, play-mode, swiss-grid, vignelli, decision-tree, kinetic-type, product-promo, nyt-graph
These rules are MANDATORY — violating them causes the compile phase to hang indefinitely:
<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>. The Compiler automatically inlines CDN scripts. NEVER use a local lib/gsap.min.js — file serving breaks during render.@import url('https://fonts.googleapis.com/...'). Instead, just declare font-family: "Inter", sans-serif in CSS. The Compiler auto-resolves common Google Fonts and caches them to ~/.cache/hyperframes/fonts/. Supported fonts include: Inter, JetBrains Mono, Roboto, etc. Noto Sans SC is NOT auto-resolved — avoid it.npx hyperframes render requires dangerouslyDisableSandbox: true because it launches Headless Chrome (Puppeteer).--non-interactive — Required for WorkBuddy automation.用户提示词的明确意图 > 硬约束(技术安全)> 软默认(参考建议)
Two modes of operation:
| 提示词类型 | AI 行为 |
|---|---|
| 详细提示词(明确指定布局、配色、风格、动画等) | 严格按提示词执行,只保留硬约束,软默认全部让位。AI 的任务是实现用户的视觉创意,不是把视频拉回模板。 |
| 简单提示词(只给主题/关键词,未指定视觉细节) | AI 自由发挥创意,参考软默认值作为起点,鼓励创新布局和视觉表达。 |
Creative License(创意许可): 当用户的提示词包含具体的视觉描述、风格要求、布局指示、动效要求时,视为创意许可——AI 应优先实现用户的视觉意图,所有软默认规则自动退让。AI 不应将每个视频拉回同一个标准模板。
无论哪种模式,AI 必须在最终输出前执行 §9 AI自检管线,确保技术安全。
这些是技术限制,违反会导致渲染崩溃、输出错误或不可预期行为:
| # | 硬约束 | 原因 |
|---|---|---|
| H1 | 禁止 Math.random()/Date.now()/new Date()/performance.now() | 非确定性渲染导致帧不一致 |
| H2 | 字体仅限白名单: Inter, JetBrains Mono, Roboto, sans-serif | 其他字体 lint 报错或渲染失败 |
| H3 | 禁止 @import url() 引入字体 | Compiler 不支持,渲染卡死 |
| H4 | GSAP CDN 必须使用 jsdelivr URL | 本地文件路径在渲染环境中不可用 |
| H5 | window.__timelines 必须同步注册 | 异步注册导致空帧 |
| H6 | 内容不可超出画布边界(任何像素) | 超出部分被裁切,用户看到残缺内容 |
| H7 | 禁止 inline style="top:XX%" 覆盖内容定位 | 百分比定位跨场景不一致,导致溢出 |
| H8 | GSAP repeat 使用 Math.floor 而非 Math.ceil | ceil 可能超出 composition 时长 |
| H9 | 音频必须 FFmpeg 后置合成(≥30s视频) | HyperFrames 内置音频 ~32s 截断 bug |
这些是经过验证的良好实践,但不是唯一正确答案:
| # | 软默认 | 默认值 | 用户可覆盖场景 |
|---|---|---|---|
| S1 | 标题区位置 | top: 50px | 用户要全屏标题、底部标题等 |
| S2 | 内容起始位置 | top: 240px | 用户要居中布局、沉浸式设计等 |
| S3 | 卡片间距 | 28-36px | 用户要紧凑/宽松排版 |
| S4 | 内容底部边界 | 980px | 用户明确要底部内容 |
| S5 | 左右安全边距 | 120px | 用户要全出血设计 |
| S6 | 字号范围 | 见参考表 | 用户指定特定字号风格 |
| S7 | 卡片宽度 | 320-520px | 用户要大卡片/小卡片 |
| S8 | 场景过渡方式 | 0.6s opacity fade | 用户指定滑动/缩放/3D等过渡 |
| S9 | 背景风格 | 暗色渐变 + 网格 + 模糊光球 | 用户指定任何其他背景 |
| S10 | 粒子数量 | 40 个, mulberry32 seed=42 | 用户不要粒子或要更多 |
| S11 | 骨架结构 | scene-wrapper 标准骨架 | 用户描述了不同的布局需求 |
| S12 | 配色方案 | 无预设(等用户指定) | 用户未指定时用中性深色/浅色 |
Before writing HTML, create a compact production checklist with:
After rendering, check the output against this list. If the user asked for a specific BGM style, do not replace it with a different style unless explicitly approved. If the user specified a color palette, do not swap it for a "safer" or "more generic" scheme.
User's prompt is the ONLY source of truth for color decisions. The Skill provides NO default theme colors. When the user provides a color scheme (e.g., "霓虹青绿玫红橙紫"), implement it exactly. When the prompt omits color requirements, AI may freely choose a palette that best serves the content and mood — no need to ask unless the user is unsatisfied.
These are reference ranges for when the user hasn't specified typography preferences. If the user describes a specific visual style (e.g., "极简大字报风格", "数据密集仪表盘"), AI should adapt freely while ensuring readability.
| Element | Reference range | Hard max (H6: 不可超出画布) |
|---|---|---|
| Cover mega title | 96-120px | 132px |
| Scene title | 64-84px | 92px |
| Subtitle | 38-56px | 64px |
| Body text | 28-42px | 48px |
| Card title | 32-44px | 48px |
| Data number | 72-104px | 112px |
| Icon/emoji | 56-96px | 112px |
| Code text | 26-36px | 42px |
Note: Hard max 仅为防溢出,不是风格限制。如果用户明确要超大标题效果且布局容得下,AI 应实现。
以下是 1920x1080 的标准布局参考。当用户提示词未指定布局时使用;当用户描述了不同的布局需求(如全屏沉浸式、非规则排列、斜切设计、居中对称等),AI 应自由设计,仅确保不违反硬约束 H6(内容不超出画布)和 H7(禁止 inline top%)。
┌──────────────────────────────────────────────┐
│ Title Safe Zone: top 50px, height ≤ 150px │ ← 标准参考
├──────────────────────────────────────────────┤
│ Content Safe Zone: top 240px ~ bottom 980px │ ← 标准参考
├──────────────────────────────────────────────┤
│ Bottom Safe Margin: bottom 100px │ ← 建议保留
└──────────────────────────────────────────────┘
硬约束(始终生效):
style="top:XX%" — 必须使用 CSS class 或固定 px 值软默认(未指定时使用):
top: 50px, 居中, height ≤ 150pxtop: 240px, 底部不超过 980pxContent height budget (标准布局下的参考计算):
available_height = 980px - 240px = 740px
max_card_height = (available_height - (rows-1) × gap) / rows
Example: 3 rows, gap=24px → max per row = (740 - 48) / 3 = 230px ✓
Example: 6 cards 2×3, gap=28px → max per row = (740 - 28) / 2 = 356px ✓
Example: 4 vertical items, gap=18px → max per item = (740 - 54) / 4 = 171px ✓
If content_height > available_height (overflow risk), apply fixes:
Card layout dimension reference (soft defaults):
| Layout type | Card width | Max gap | Container width |
|---|---|---|---|
| 3 horizontal | 360-480px | 36px | 90% (1728px) |
| 4 horizontal | 280-380px | 30px | 90% |
| 2×3 grid | 320-520px | 28px | 1600px |
| 2 large horizontal | 680-780px | 36px | 90% |
| Vertical list (3-4 items) | 90% width | 20-24px | 90% |
| Timeline (3 items) | 90% width | 20px | 90% |
Only use fonts that HyperFrames Compiler can auto-resolve:
/* ALLOWED */
font-family: "Inter", sans-serif;
font-family: "JetBrains Mono", monospace; /* for code blocks */
font-family: "Roboto", sans-serif;
/* PROHIBITED — cause lint errors or render failures */
font-family: "PingFang SC"; /* macOS only, not bundled */
font-family: "Microsoft YaHei"; /* Windows only */
font-family: "Noto Sans SC"; /* not auto-resolved */
font-family: "Source Han Sans"; /* not auto-resolved */
Chinese text rendering: rely on sans-serif fallback (renders correctly in Chrome). The visual difference is negligible for video output.
HyperFrames requires frame-level determinism for multi-pass rendering:
// ❌ PROHIBITED — non-deterministic (H1)
Math.random()
Date.now()
new Date()
performance.now() // for positioning
// ✅ REQUIRED — use seeded PRNG
function mulberry32(seed) {
return function() {
seed |= 0; seed = seed + 0x6D2B79F5 | 0;
let t = Math.imul(seed ^ seed >>> 15, 1 | seed);
t = t + Math.imul(t ^ t >>> 7, 61 | t) ^ t;
return ((t ^ t >>> 14) >>> 0) / 4294967296;
}
}
const rand = mulberry32(42); // fixed seed
Also for GSAP repeat calculations (H8):
// ❌ May overshoot composition duration
repeat: Math.ceil(duration / cycle) - 1
// ✅ Guaranteed to stay within bounds
repeat: Math.floor(duration / cycle) - 1
The root composition duration must equal the last scene end time:
root_duration = max(scene.data_start + scene.data_duration)
Never leave data-duration shorter than the final scene. This causes unreliable render behavior and confusing QA results.
For videos longer than 30s, do not rely on short audio loops inside HyperFrames. Use this reliable workflow:
root_duration + 3s.data-duration can match video, but DO NOT trust the rendered audio).ffmpeg -y -i rendered.mp4 -i bgm_full.wav -map 0:v:0 -map 1:a:0 \
-c:v copy -c:a aac -b:a 192k -t <root_duration+0.02> \
-movflags +faststart final.mp4
ffmpeg -y -i final.mp4 -vn -ac 1 -ar 44100 extracted_audio.wav
ffprobe -v quiet -show_entries format=duration -show_entries stream=codec_type,duration -of default=noprint_wrappers=1 final.mp4
ffmpeg -ss <root_duration-12> -t 12 -i final.mp4 -af volumedetect -f null /dev/null 2>&1 | grep -E "mean_volume|max_volume"
Bundled helper option:
python scripts/verify_audio.py final.mp4 --min-duration <root_duration> --tail-seconds 12
A successful BGM check must prove:
BGM generation fallback: Only generate BGM when the user explicitly requests new music or when the source material has no audio to preserve. If numpy is unavailable, use Python standard library struct + wave + math to synthesize. A 130BPM electronic track can include: kick (4-on-floor), hi-hat (8th notes), bass (sub oscillator), pad (chord progression), lead melody, and arpeggio layers. For existing edited videos with original audio, preserve and post-mux the original audio instead of synthesizing replacement music.
Bundled helper option for generated BGM projects:
python scripts/gen_bgm.py bgm.wav --duration <root_duration+3> --bpm 110 --volume 0.25
无论用户提示词是详细还是简单,最终交付前都必须执行此管线。 此管线验证的是技术安全和视觉完整性,不是风格是否符合某个模板。
Execute ALL checks in this exact order. Do NOT skip any step.
| # | Check | 详细提示词模式 | 简单提示词模式 |
|---|---|---|---|
| A1 | Scene count × avg duration = total duration? | 按提示词场景规划 | AI 自行规划 |
| A2 | Content density reasonable for scene duration? | 按提示词内容量 | 参考中文密度表 |
| A3 | Colors match user specification? | 严格匹配提示词配色 | AI 自由选择 |
| A4 | Font family only uses whitelist? (H2) | 始终检查 | 始终检查 |
| A5 | 是否存在 Hard Constraint 冲突? | 始终检查 | 始终检查 |
# B1: Lint must pass with 0 errors (HARD — always)
npx hyperframes lint
# B2: Check for prohibited patterns in HTML (HARD — H1, H2)
grep -n "Math.random\|Date.now\|PingFang\|Microsoft YaHei\|Noto Sans" index.html
# → must return empty
# B3: Check for inline top% overrides (HARD — H7)
grep -n 'style=.*top:.*%' index.html
# → If found: REMOVE immediately (use CSS class or fixed px)
# B4: Check root data-duration matches expected
grep 'data-composition-id.*data-duration\|data-start.*data-duration' index.html
# → Verify last scene end = root duration
| # | Check | Command | Pass criteria |
|---|---|---|---|
| C1 | Video duration | ffprobe -show_format | ≥ target - 0.1s |
| C2 | Resolution | ffprobe -show_streams | 匹配用户要求的分辨率 |
| C3 | Frame rate | ffprobe -show_streams | 30fps (除非用户要求 60fps) |
| C4 | Audio duration after mux | Extract WAV, check length | ≥ video duration |
| C5 | Last 12s no silence | Per-second RMS check | All seconds > -30dB |
| C6 | Audio mean volume | volumedetect | -15dB ~ -20dB |
For every video, extract key frames and verify visual integrity:
# Extract key frames at scene midpoints
ffmpeg -ss <mid_time> -i final.mp4 -frames:v 1 -q:v 2 check_scene_N.jpg
通用视觉检查(两种模式都执行):
额外检查(仅标准布局模式——使用软默认时):
If ANY check in Phases B-D fails → fix → re-render → re-verify. Maximum 2 retry cycles; if still failing after 2 retries, report the specific issue to the user.
当用户的提示词未指定以下细节时,AI 可参考这些经过验证的默认值作为起点,但鼓励在此基础上发挥创意:
| Missing item | Default reference | AI 可自由替换? |
|---|---|---|
| Layout strategy | Title top:50px + Content top:240px | ✅ 可用任何不违反 H6/H7 的布局 |
| Font | Inter, sans-serif | ❌ 硬约束 H2,只能用白名单字体 |
| Audio strategy | FFmpeg post-mux from full-length WAV | ❌ 硬约束 H9(≥30s 视频) |
| Content container | .content-area / .content-col | ✅ 可用任何语义化 CSS 结构 |
| Card gap | 28-36px (horizontal), 20-24px (vertical) | ✅ |
| Particle generation | 40 particles, mulberry32 seed=42 | ✅ 数量自由,但 PRNG 必须用 mulberry32 (H1) |
| Background | Dark gradient + grid + blur orb | ✅ 完全自由 |
| Transition style | 0.6s opacity fade | ✅ 可用滑动、缩放、模糊等 |
| Grid background animation | 20s infinite translate loop | ✅ |
| Verification pipeline | Full Phase A-D self-check | ❌ 始终必须执行 |
Key principle: 默认值是灵感参考,不是束缚。AI 应根据视频主题和内容自然选择最佳表达方式。
| Pitfall | Root cause | Type | Prevention |
|---|---|---|---|
| Content overflows bottom | top:XX% inline style | 🔴 H7 | 禁止 inline top%,用 CSS class 固定 px |
| Audio cuts at ~32s | HyperFrames built-in audio bug | 🔴 H9 | Always FFmpeg post-mux |
| Non-deterministic frames | Math.random() | 🔴 H1 | Replace with mulberry32 PRNG |
| Lint error: unresolved font | Using PingFang SC etc. | 🔴 H2 | Font whitelist only |
| GSAP overruns composition | Math.ceil for repeat count | 🔴 H8 | Use Math.floor instead |
| Empty/black frames | window.__timelines not registered | 🔴 H5 | Ensure synchronous registration |
| Cards overlap title | transform:translate(-50%,-50%) centering | ⚠️ 自检 | Phase D frame spot-check 发现即修 |
| 2×3 grid uneven spacing | Card width too small for container | ⚠️ 自检 | Phase D 视觉检查 |
| Scene jump cuts | Missing exit transitions | ⚠️ 自检 | 除非用户要求跳切风格 |
A composition is an HTML file. The root container defines the video canvas.
<!doctype html>
<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
</head>
<body>
<div data-composition-id="main" data-start="0" data-width="1920" data-height="1080">
<!-- Video clip: track 0, starts at 0s, plays for 10s -->
<video id="bg-video" data-start="0" data-duration="10" data-track-index="0"
src="background.mp4" muted playsinline></video>
<!-- Title overlay: track 1, appears at 1s for 4s -->
<h1 id="title" class="clip" data-start="1" data-duration="4" data-track-index="1">
Product Launch
</h1>
<!-- Audio: track 2, from 0s for 10s, volume 50% -->
<audio data-start="0" data-duration="10" data-track-index="2"
data-volume="0.5" src="music.wav"></audio>
</div>
<style>
body { margin: 0; overflow: hidden; }
[data-composition-id="main"] {
width: 1920px; height: 1080px;
position: relative; background: #000;
}
#title {
position: absolute; top: 50%; left: 50%;
transform: translate(-50%, -50%);
font-size: 96px; color: white; font-family: sans-serif;
}
</style>
<script>
window.__timelines = window.__timelines || {};
const tl = gsap.timeline({ paused: true });
tl.from("#title", { opacity: 0, y: 60, duration: 0.8, ease: "power3.out" }, 1);
tl.to("#title", { opacity: 0, y: -40, duration: 0.5, ease: "power2.in" }, 4);
window.__timelines["main"] = tl;
</script>
</body>
</html>
For videos with 3+ scenes when the user hasn't specified a custom layout, this skeleton provides a proven starting point. If the user's prompt describes a different visual structure (e.g., full-screen transitions, split-screen, non-linear navigation, cinematic parallax), AI should design freely — only the Hard Constraints (H1-H9) remain in effect.
<!doctype html>
<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
</head>
<body>
<div data-composition-id="main" data-start="0" data-duration="TOTAL_SECONDS"
data-width="1920" data-height="1080">
<!-- ===== BACKGROUND LAYER (persists entire video) ===== -->
<div id="bg-layer" style="position:absolute;inset:0;z-index:0;">
<!-- Grid background, particles, orbs — shared across scenes -->
<div id="grid-bg"></div>
<div id="particles-container"></div>
</div>
<!-- ===== SCENE 1: Opening ===== -->
<div id="scene-1" class="scene-wrapper" data-start="0" data-duration="4" data-track-index="1">
<div class="scene-title" id="s1-title">
<!-- Main title content -->
</div>
<div class="content-area" id="s1-content">
<!-- Scene content: cards, text, data -->
</div>
</div>
<!-- ===== SCENE 2 ===== -->
<div id="scene-2" class="scene-wrapper" data-start="4" data-duration="4" data-track-index="1">
<div class="scene-title" id="s2-title">
<h2>场景标题</h2>
<p class="subtitle">副标题说明</p>
</div>
<div class="content-area" id="s2-cards">
<!-- Horizontal card layout -->
</div>
</div>
<!-- ===== SCENE N: (repeat pattern) ===== -->
<!-- ... -->
<!-- ===== AUDIO ===== -->
<audio data-start="0" data-duration="TOTAL_SECONDS" data-track-index="2"
data-volume="0.5" src="bgm.wav"></audio>
</div>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { margin: 0; overflow: hidden; background: #0a0a0f; font-family: "Inter", sans-serif; color: #fff; }
[data-composition-id="main"] { width: 1920px; height: 1080px; position: relative; overflow: hidden; }
/* ===== SCENE WRAPPER ===== */
.scene-wrapper { position: absolute; inset: 0; opacity: 0; }
/* ===== TITLE ZONE: fixed top 50px ===== */
.scene-title {
position: absolute;
top: 50px;
left: 0;
width: 100%;
text-align: center;
z-index: 10;
}
.scene-title h2 { font-size: 76px; font-weight: 700; margin: 0; }
.scene-title .subtitle { font-size: 38px; opacity: 0.7; margin-top: 8px; }
/* ===== CONTENT ZONE: fixed top 240px ===== */
.content-area {
position: absolute;
top: 240px;
left: 50%;
transform: translateX(-50%);
width: 90%;
display: flex;
justify-content: center;
align-items: flex-start;
gap: 36px;
flex-wrap: wrap;
}
.content-col {
position: absolute;
top: 240px;
left: 50%;
transform: translateX(-50%);
width: 90%;
display: flex;
flex-direction: column;
align-items: center;
gap: 24px;
}
/* ===== CARD STYLES ===== */
.glass-card {
background: rgba(255,255,255,0.04);
border: 1px solid rgba(255,255,255,0.08);
border-radius: 16px;
padding: 28px 24px;
backdrop-filter: blur(12px);
}
</style>
<script>
// ===== Seeded PRNG (mulberry32) =====
function mulberry32(seed) {
return function() {
seed |= 0; seed = seed + 0x6D2B79F5 | 0;
let t = Math.imul(seed ^ seed >>> 15, 1 | seed);
t = t + Math.imul(t ^ t >>> 7, 61 | t) ^ t;
return ((t ^ t >>> 14) >>> 0) / 4294967296;
}
}
const rand = mulberry32(42);
// ===== GSAP Timeline =====
window.__timelines = window.__timelines || {};
const tl = gsap.timeline({ paused: true });
const TOTAL = TOTAL_SECONDS;
// --- Scene 1: 0-4s ---
tl.to("#scene-1", { opacity: 1, duration: 0.01 }, 0);
tl.from("#s1-title", { scale: 0.5, opacity: 0, duration: 0.8, ease: "back.out(1.4)" }, 0.1);
tl.from("#s1-content", { y: 40, opacity: 0, duration: 0.6 }, 0.5);
tl.to("#scene-1", { opacity: 0, duration: 0.5 }, 3.4); // exit at scene_end - 0.6
// --- Scene 2: 4-8s ---
tl.to("#scene-2", { opacity: 1, duration: 0.01 }, 4);
tl.from("#s2-title", { x: -60, opacity: 0, duration: 0.6, ease: "power2.out" }, 4.1);
tl.from("#s2-cards .glass-card", { scale: 0.6, opacity: 0, duration: 0.5, stagger: 0.15 }, 4.4);
tl.to("#scene-2", { opacity: 0, duration: 0.5 }, 7.4);
// --- Scene N: (repeat pattern) ---
// ...
window.__timelines["main"] = tl;
</script>
</body>
</html>
Skeleton naming conventions (recommended for consistency, not mandatory):
#scene-1, #scene-2, ... #scene-N#s1-title, #s2-title, ... #sN-title#s1-content, #s2-cards, #s3-items, ... (descriptive suffix).glass-card, .data-card, .app-card (semantic naming)#bg-layer, #grid-bg, #particles-containerScene wrapper pattern (recommended, AI may use alternative transition approaches if user requests):
// Scene entry (instant opacity on)
tl.to("#scene-N", { opacity: 1, duration: 0.01 }, SCENE_START);
// Scene content animations
tl.from("#sN-title", { /* entrance */ }, SCENE_START + 0.1);
tl.from("#sN-content ...", { /* entrance */ }, SCENE_START + 0.3);
// Scene exit (0.6s before end) — EXCEPT final scene
tl.to("#scene-N", { opacity: 0, duration: 0.5 }, SCENE_END - 0.6);
| Attribute | Required | Purpose |
|---|---|---|
data-composition-id | Yes | Unique ID for the composition |
data-start | Yes | Start time in seconds (or clip ID ref: "el-1 + 2") |
data-duration | Yes for img/div | Duration in seconds (video/audio auto-detect) |
data-track-index | Yes | Track layer (same-track clips cannot overlap) |
data-width / data-height | Yes (root) | Canvas dimensions (1920x1080 or 1080x1920) |
data-volume | No | Audio volume 0-1 (default 1) |
data-media-start | No | Trim offset into source media |
data-composition-src | No | Path to external sub-composition HTML |
muted playsinline — audio is always a separate <audio> elementvideo.play()/audio.play() — the framework owns playbackdata-duration are longer. Always verify the rendered MP4 by extracting its audio stream and checking the decoded audio duration/RMS; volumedetect -ss 30 -t 12 can be misleading because it only analyzes available samples.ffmpeg -i rendered.mp4 -i bgm_full.wav -map 0:v:0 -map 1:a:0 -c:v copy -c:a aac -b:a 192k -t <composition_duration> -movflags +faststart final.mp4. Verify by extracting final.mp4 audio to WAV and checking per-second RMS through the end.{ paused: true } — player controls playbackwindow.__timelines["<composition-id>"] = tldata-duration, NOT from GSAP timeline lengthrepeat: -1 — calculate exact repeats: repeat: Math.ceil(duration / cycleDuration) - 1Math.random(), Date.now() — use seeded PRNG if neededsetTimeout, await, Promisesopacity, x, y, scale, rotation, color, backgroundColorvisibility, displaygsap.from()) on every elementBuild the end-state first as static CSS, then add motion:
gsap.from() — animate FROM offscreen TO CSS positiongsap.to() — only on final scenenpx hyperframes lint # Structure/code check (fast)
npx hyperframes lint --json # Machine-readable
npx hyperframes inspect # Visual layout check (launches Chrome)
npx hyperframes inspect --json # Agent-readable findings
Fix all errors before rendering. Warnings should be addressed.
npx hyperframes render # Standard MP4
npx hyperframes render --quality draft # Fast iteration (~3x faster)
npx hyperframes render --quality high --fps 60 # Final delivery
npx hyperframes render --output final.mp4 # Custom filename
npx hyperframes render --format webm # Transparent WebM
npx hyperframes render --docker # Byte-level consistency
| Flag | Options | Default | Notes |
|---|---|---|---|
--output | path | renders/name_timestamp.mp4 | Output path |
--fps | 24, 30, 60 | 30 | 60fps doubles render time |
--quality | draft, standard, high | standard | draft for iteration |
--format | mp4, webm | mp4 | WebM supports transparency |
--workers | 1-8 or auto | auto | Each worker spawns a Chrome |
--docker | flag | off | Reproducible output |
--variables | JSON | — | Override composition variables |
Deliver the rendered MP4 to user via deliver_attachments.
Local inference with Kokoro-82M. No API key needed.
npx hyperframes tts "Your script here" --voice af_heart --output narration.wav
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
npx hyperframes tts --list # List all 54 voices
Voice selection:
| Content Type | Voice | Reason |
|---|---|---|
| Product demo | af_heart / af_nova | Warm, professional |
| Tutorial | am_adam / bf_emma | Neutral, easy to follow |
| Marketing | af_sky / am_michael | Energetic or authoritative |
| Casual/social | af_heart / af_sky | Friendly, natural |
Chinese voices use z prefix: e.g., zf_xiaobei, zm_yunjian.
Requirements: Python 3.8+, pip install kokoro-onnx soundfile. Non-English needs espeak-ng.
Local Whisper inference. No API key needed.
npx hyperframes transcribe audio.mp3 # → transcript.json
npx hyperframes transcribe video.mp4 --model small # Default model
npx hyperframes transcribe video.mp4 --model medium # Better accuracy
npx hyperframes transcribe subtitles.srt # Import existing
Output: transcript.json with word-level timestamps:
[{"id": "w0", "text": "Hello", "start": 0.0, "end": 0.5}, ...]
CRITICAL: Never use .en models unless audio is confirmed English. .en models TRANSLATE instead of transcribe.
Local u2net_human_seg model. No API key.
npx hyperframes remove-background talking-head.mp4 -o transparent.webm
npx hyperframes remove-background subject.mp4 -o subject.webm --background-output plate.webm
npx hyperframes remove-background portrait.jpg -o cutout.png
Output formats:
.webm (VP9 + alpha) — for direct use in <video> composition.mov (ProRes 4444) — for editing in external tools.png — single image cutout# 1. Remove background from talking-head video
npx hyperframes remove-background talking-head.mp4 -o transparent.webm
# 2. Transcribe for captions
npx hyperframes transcribe talking-head.mp4 --model small
# 3. Init project and compose
npx hyperframes init captioned-video --non-interactive
Composition structure:
transparent.webm)transcript.json)<audio> with data-volume="0.3")npx hyperframes init product-launch --example product-promo --non-interactive
Typical structure: 3-5 scenes with title → features → demo → CTA.
Structure: Terminal/editor mockup background + animated code highlights + voiceover captions.
Structure: Animated charts (CSS/GSAP driven), counters, stat reveals with staggered entrances.
Set data-width="1080" data-height="1920" on root. Refer to "Vertical Video Safe Zones (P1)" section for complete layout rules, typography scale, and safe margins. Key points: title at top 160px, content zone 400-1600px, bottom 320px reserved for platform UI, fast pacing (2-3s per scene), bold colors, single-column card layout preferred.
本章节覆盖 对已有视频素材进行后期处理 的完整流程——加字幕、叠特效、换背景、混音 BGM、画中画等。
| 场景 | 输入 | 输出 |
|---|---|---|
| 口播 + 字幕 | 一段录好的口播 MP4 | 带字幕动效的成品 MP4 |
| 口播 + 字幕 + 特效 | 口播 MP4 | 带字幕 + 粒子/光效叠加的成品 |
| 口播 + 换背景 | 口播 MP4(纯色/杂背景) | 去背景 + 新背景的成品 |
| 多段素材拼接 | 多个 MP4 片段 | 合并 + 转场 + 统一字幕的成品 |
| 画中画 | 主视频 + 辅助画面 | PiP 布局的成品 |
以下经验来自一次 86 秒口播视频连续迭代中反复出现的问题。处理已有口播视频时必须优先执行这些规则,避免重复踩坑。
ffmpeg -ss 76 -t 10 -i input.mov -af volumedetect -f null /dev/null 2>&1 | grep -E "mean_volume|max_volume"
如果 mean_volume/max_volume 有有效值,说明该段确实有声音,后续成品必须保留。
.mov 或长音频中直接 mux 完整音频;不要在 <audio> 中直接引用 .mov 作为音频源来保留长尾 BGM。正确做法:# 1. 从原视频提取完整音频
ffmpeg -y -i input.mov -vn -acodec pcm_s16le -ar 44100 -ac 1 original_audio.wav
# 2. HyperFrames 渲染视觉版(可以不放 <audio>,允许 silent visual output)
npx hyperframes render --output visual.mp4
# 3. 后置合成完整原音频
ffmpeg -y -i visual.mp4 -i original_audio.wav \
-map 0:v:0 -map 1:a:0 -c:v copy -c:a aac -b:a 192k \
-t <video_duration> -movflags +faststart final.mp4
ffprobe 时长,还要检查用户指出的尾段:ffprobe -v quiet -show_entries format=duration -show_entries stream=codec_type,duration -of default=noprint_wrappers=1 final.mp4
ffmpeg -ss 76 -t 10 -i final.mp4 -af volumedetect -f null /dev/null 2>&1 | grep -E "mean_volume|max_volume"
cap-*,只改目标时间段,不顺手改其它字幕。 对 3s-7s 这类精确时间段,先在 index.html 中找到 data-start/data-duration 覆盖该区间的字幕节点,再替换文本。right: ... 改为明确 left: ... 或减少 right 值,并在交付说明中标注实际坐标变化。在每次渲染交付前,至少完成以下验证:
cap-* 文本与用户要求完全一致。┌─────────────────────────────────────────────────────────────┐
│ Phase 1: 素材预处理 │
│ ① 检查素材格式 → ② FFmpeg 标准化 → ③ 提取音频 │
├─────────────────────────────────────────────────────────────┤
│ Phase 2: 分析与转写 │
│ ④ 语音转写 → ⑤ 字幕分组 → ⑥ 时间轴确认 │
├─────────────────────────────────────────────────────────────┤
│ Phase 3: 合成编排 │
│ ⑦ 初始化项目 → ⑧ 编写 Composition HTML → ⑨ 渲染 │
├─────────────────────────────────────────────────────────────┤
│ Phase 4: 后期混音 │
│ ⑩ BGM 合成 → ⑪ 音量平衡 → ⑫ 交付 │
└─────────────────────────────────────────────────────────────┘
# 检查视频信息
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4
# 确认关键参数
# - 分辨率:1920x1080 或 1080x1920(竖版)
# - 帧率:25/30fps
# - 编码:H.264/H.265
# - 音频:AAC,采样率 44100/48000
# 分辨率不是 1080p → 缩放
ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" -c:a copy normalized.mp4
# 竖版视频标准化
ffmpeg -i input.mp4 -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" -c:a copy normalized_v.mp4
# 帧率标准化为 30fps
ffmpeg -i input.mp4 -r 30 -c:a copy fps30.mp4
# 提取纯音频(用于转写和混音)
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio_for_transcribe.wav
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 audio_original.wav
# 获取视频时长(秒)
ffprobe -v error -show_entries format=duration -of csv=p=0 input.mp4
# 方式 1: FFmpeg concat demuxer(推荐,无重编码)
# 创建 filelist.txt:
# file 'clip1.mp4'
# file 'clip2.mp4'
# file 'clip3.mp4'
ffmpeg -f concat -safe 0 -i filelist.txt -c copy merged.mp4
# 方式 2: 需要重编码(分辨率/编码不同时)
ffmpeg -f concat -safe 0 -i filelist.txt -vf "scale=1920:1080" -c:v libx264 -c:a aac merged.mp4
# 方式 3: 在 HyperFrames 中用多个 <video> 元素分段播放(带转场)
# → 见后续 Composition 模板
# 转写中文口播(推荐 medium 模型,中文识别更准)
npx hyperframes transcribe input.mp4 --model medium
# 如果已有 SRT/VTT 字幕文件
npx hyperframes transcribe existing.srt
中文字幕分组规则:
| 规则 | 说明 |
|---|---|
| 每行最大字数 | 14-16 个中文字符 |
| 每组最大行数 | 2 行 |
| 按语义断句 | 在标点符号(,。!?)处断开 |
| 最短停留时间 | ≥ 1.2 秒(给观众阅读时间) |
| 最长停留时间 | ≤ 5 秒(避免字幕"粘"太久) |
<!DOCTYPE html>
<html>
<head>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
/* === 字幕层样式 === */
.caption {
position: absolute;
bottom: 100px;
left: 50%;
transform: translateX(-50%);
font-family: "Inter", sans-serif;
font-size: 42px;
font-weight: 700;
color: #ffffff;
text-align: center;
max-width: 75%;
padding: 12px 24px;
border-radius: 8px;
background: rgba(0, 0, 0, 0.6);
text-shadow: 0 2px 4px rgba(0, 0, 0, 0.5);
/* 中文适配 */
line-height: 1.5;
letter-spacing: 0.02em;
}
/* === 特效层样式 === */
.effect-layer {
position: absolute;
inset: 0;
pointer-events: none;
z-index: 10;
}
.particle {
position: absolute;
width: 4px;
height: 4px;
border-radius: 50%;
background: rgba(255, 255, 255, 0.6);
}
</style>
</head>
<body>
<!-- Root composition -->
<div data-composition-id="post-production"
data-width="1920" data-height="1080" data-fps="30"
data-duration="VIDEO_DURATION">
<!-- Track 0: 原始视频(或去背景后的视频 + 新背景) -->
<video id="main-video"
data-start="0" data-duration="VIDEO_DURATION" data-track-index="0"
src="input.mp4"
style="position:absolute;inset:0;width:100%;height:100%;object-fit:cover;">
</video>
<!-- Track 1: 字幕层 -->
<div id="cap-1" class="caption" data-start="0.5" data-duration="2.3" data-track-index="1">
大家好,欢迎来到今天的分享
</div>
<div id="cap-2" class="caption" data-start="2.8" data-duration="2.0" data-track-index="1">
今天我们聊一聊 AI 编程
</div>
<!-- ... 更多字幕 ... -->
<!-- Track 2: 特效叠加层 -->
<div id="effects" class="effect-layer"
data-start="0" data-duration="VIDEO_DURATION" data-track-index="2">
<!-- 粒子/光效/图标等 -->
</div>
<!-- Track 3: BGM -->
<audio data-start="0" data-duration="VIDEO_DURATION" data-track-index="3"
src="bgm.wav" data-volume="0.25"></audio>
</div>
<script src="https://cdn.jsdelivr.net/npm/gsap@3/dist/gsap.min.js"></script>
<script>
// 字幕入场动画
const captionTl = gsap.timeline({ paused: true });
document.querySelectorAll('.caption').forEach(cap => {
const start = parseFloat(cap.dataset.start);
captionTl.fromTo(cap,
{ opacity: 0, y: 20 },
{ opacity: 1, y: 0, duration: 0.3 },
start
);
captionTl.to(cap,
{ opacity: 0, duration: 0.2 },
start + parseFloat(cap.dataset.duration) - 0.2
);
});
window.__timelines = window.__timelines || {};
window.__timelines["post-production"] = captionTl;
</script>
</body>
</html>
<!-- Track 0: 新背景 -->
<div id="new-bg" data-start="0" data-duration="VIDEO_DURATION" data-track-index="0"
style="position:absolute;inset:0;background:linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%);">
<!-- 可放动态网格、粒子等背景动画 -->
</div>
<!-- Track 1: 去背景后的人物 -->
<video id="person" data-start="0" data-duration="VIDEO_DURATION" data-track-index="1"
src="transparent.webm"
style="position:absolute;bottom:0;left:50%;transform:translateX(-50%);height:90%;object-fit:contain;">
</video>
<!-- Track 2: 字幕 -->
<!-- ... -->
<!-- Track 0: 主画面(全屏) -->
<video id="main" data-start="0" data-duration="VIDEO_DURATION" data-track-index="0"
src="main-content.mp4"
style="position:absolute;inset:0;width:100%;height:100%;object-fit:cover;">
</video>
<!-- Track 1: 画中画(右下角小窗) -->
<video id="pip" data-start="0" data-duration="VIDEO_DURATION" data-track-index="1"
src="talking-head.mp4"
style="position:absolute;bottom:40px;right:40px;width:360px;height:360px;
border-radius:50%;object-fit:cover;border:3px solid rgba(255,255,255,0.8);
box-shadow:0 4px 20px rgba(0,0,0,0.3);">
</video>
<!-- PiP 变体:左下角矩形 -->
<!--
<video id="pip-rect" ...
style="position:absolute;bottom:40px;left:40px;width:480px;height:270px;
border-radius:12px;object-fit:cover;border:2px solid rgba(255,255,255,0.5);">
</video>
-->
<!-- 左右分屏 50/50 -->
<video id="left" data-start="0" data-duration="VIDEO_DURATION" data-track-index="0"
src="screen-recording.mp4"
style="position:absolute;left:0;top:0;width:50%;height:100%;object-fit:cover;">
</video>
<video id="right" data-start="0" data-duration="VIDEO_DURATION" data-track-index="0"
src="talking-head.mp4"
style="position:absolute;right:0;top:0;width:50%;height:100%;object-fit:cover;">
</video>
<!-- 中间分割线 -->
<div data-start="0" data-duration="VIDEO_DURATION" data-track-index="1"
style="position:absolute;left:50%;top:0;width:2px;height:100%;background:rgba(255,255,255,0.3);transform:translateX(-50%);">
</div>
# 1. 渲染视频(无音频)
npx hyperframes render --non-interactive
# 2. 提取原始口播音频
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 voice.wav
# 3. 混合:原声为主,BGM 为辅
ffmpeg -i voice.wav -i bgm.wav -filter_complex \
"[0:a]volume=1.0[voice];[1:a]volume=0.25[bgm];[voice][bgm]amix=inputs=2:duration=first" \
-ac 2 -ar 44100 mixed_audio.wav
# 4. 合并视频 + 混音
ffmpeg -i rendered_video.mp4 -i mixed_audio.wav -c:v copy -c:a aac -shortest final.mp4
当人声出现时 BGM 自动降低音量,人声停顿时 BGM 恢复:
# 使用 sidechaincompress 实现 ducking
ffmpeg -i voice.wav -i bgm.wav -filter_complex \
"[1:a]volume=0.35[bgm_vol];\
[bgm_vol][0:a]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[bgm_ducked];\
[0:a][bgm_ducked]amix=inputs=2:duration=first[out]" \
-map "[out]" -ac 2 -ar 44100 mixed_ducked.wav
参数说明:
threshold=0.02: 人声信号强度阈值(越低越敏感)ratio=4: 压缩比(4:1 表示 BGM 降到原来 1/4)attack=200: 压缩启动时间 200ms(避免突然降低)release=1000: 释放时间 1000ms(人声停后 1 秒 BGM 恢复)# 测量当前音量
ffmpeg -i mixed_audio.wav -af "volumedetect" -f null /dev/null
# 标准化到 -16 LUFS(适合社交媒体)
ffmpeg -i mixed_audio.wav -af "loudnorm=I=-16:TP=-1.5:LRA=11" normalized.wav
<div id="particles" class="effect-layer" data-start="0" data-duration="VIDEO_DURATION" data-track-index="2">
<!-- 粒子由 JS 生成 -->
</div>
<script>
function mulberry32(seed) {
return function() {
seed |= 0; seed = seed + 0x6D2B79F5 | 0;
let t = Math.imul(seed ^ seed >>> 15, 1 | seed);
t = t + Math.imul(t ^ t >>> 7, 61 | t) ^ t;
return ((t ^ t >>> 14) >>> 0) / 4294967296;
}
}
const rand = mulberry32(42);
// 生成粒子
const container = document.getElementById('particles');
for (let i = 0; i < 30; i++) {
const p = document.createElement('div');
p.className = 'particle';
p.style.cssText = `
left: ${rand() * 100}%;
top: -10px;
width: ${3 + rand() * 4}px;
height: ${3 + rand() * 4}px;
opacity: ${0.3 + rand() * 0.5};
background: hsl(${200 + rand() * 60}, 80%, 70%);
`;
container.appendChild(p);
}
// 粒子下落动画
const particleTl = gsap.timeline({ paused: true });
container.querySelectorAll('.particle').forEach((p, i) => {
particleTl.to(p, {
y: 1200,
x: `+=${(rand() - 0.5) * 200}`,
duration: 4 + rand() * 3,
repeat: Math.floor(VIDEO_DURATION / 6),
ease: "none",
delay: rand() * 3
}, 0);
});
// 注意:将 particleTl 加入 window.__timelines
</script>
<div id="lower-third" class="effect-layer" data-start="2" data-duration="8" data-track-index="2">
<div style="position:absolute;bottom:60px;left:60px;display:flex;align-items:center;gap:16px;">
<div style="width:4px;height:48px;background:linear-gradient(180deg,#00d4ff,#7b2ff7);border-radius:2px;"></div>
<div>
<div style="font-family:'Inter',sans-serif;font-size:28px;font-weight:700;color:#fff;">张三</div>
<div style="font-family:'Inter',sans-serif;font-size:20px;color:rgba(255,255,255,0.7);margin-top:4px;">高级产品经理 · 某科技公司</div>
</div>
</div>
</div>
<script>
const ltTl = gsap.timeline({ paused: true });
const lt = document.querySelector('#lower-third > div');
ltTl.fromTo(lt, { x: -300, opacity: 0 }, { x: 0, opacity: 1, duration: 0.5, ease: "power2.out" }, 2);
ltTl.to(lt, { x: -300, opacity: 0, duration: 0.4, ease: "power2.in" }, 9.5);
// 加入 window.__timelines
</script>
<div id="light-sweep" class="effect-layer" data-start="0" data-duration="VIDEO_DURATION" data-track-index="2">
<div class="sweep-bar" style="
position:absolute;
top:0;left:-200px;
width:200px;height:100%;
background:linear-gradient(90deg, transparent, rgba(255,255,255,0.08), transparent);
transform:skewX(-15deg);
"></div>
</div>
<script>
const sweepTl = gsap.timeline({ paused: true });
sweepTl.to('.sweep-bar', {
x: 2200,
duration: 3,
repeat: Math.floor(VIDEO_DURATION / 5),
repeatDelay: 2,
ease: "power1.inOut"
}, 0);
</script>
<!-- 在特定时间点弹出关键信息 -->
<div id="keyword-pop" data-start="5" data-duration="3" data-track-index="2"
style="position:absolute;top:50%;right:80px;transform:translateY(-50%);
font-family:'Inter',sans-serif;font-size:56px;font-weight:900;
color:#00d4ff;text-shadow:0 0 20px rgba(0,212,255,0.5);">
效率提升 300%
</div>
<script>
const kwTl = gsap.timeline({ paused: true });
kwTl.fromTo('#keyword-pop',
{ scale: 0, opacity: 0, rotation: -5 },
{ scale: 1, opacity: 1, rotation: 0, duration: 0.4, ease: "back.out(1.7)" },
5
);
kwTl.to('#keyword-pop', { opacity: 0, y: -30, duration: 0.3 }, 7.5);
</script>
AI 根据视频风格自动选择最合适的字幕动效:
| 动效类型 | 适用场景 | CSS/GSAP 实现 |
|---|---|---|
| 淡入淡出 | 正式/商务口播 | opacity: 0→1→0 |
| 底部弹出 | 活泼/教程 | y: 20→0, opacity: 0→1 |
| 逐字打字机 | 科技/极客风 | 每字 stagger 0.05s |
| 卡拉OK高亮 | 重点强调 | word-level 颜色切换 |
| 缩放弹入 | 短视频/抖音风 | scale: 0.5→1, ease: back.out |
<div id="cap-karaoke" class="caption" data-start="3" data-duration="2.5" data-track-index="1">
<span class="word" data-word-start="3.0" data-word-end="3.4">今天</span>
<span class="word" data-word-start="3.4" data-word-end="3.7">我们</span>
<span class="word" data-word-start="3.7" data-word-end="4.1">来聊</span>
<span class="word" data-word-start="4.1" data-word-end="4.5">AI</span>
<span class="word" data-word-start="4.5" data-word-end="5.0">编程</span>
</div>
<style>
.word { color: rgba(255,255,255,0.5); transition: color 0.1s; }
.word.active { color: #00d4ff; text-shadow: 0 0 10px rgba(0,212,255,0.5); }
</style>
<script>
const karaokeTl = gsap.timeline({ paused: true });
document.querySelectorAll('#cap-karaoke .word').forEach(w => {
const start = parseFloat(w.dataset.wordStart);
karaokeTl.to(w, { className: "+=active", duration: 0.01 }, start);
});
</script>
const typeTl = gsap.timeline({ paused: true });
const capEl = document.getElementById('cap-type');
const chars = capEl.textContent.split('');
capEl.textContent = '';
chars.forEach(ch => {
const span = document.createElement('span');
span.textContent = ch;
span.style.opacity = '0';
capEl.appendChild(span);
});
typeTl.to(capEl.querySelectorAll('span'), {
opacity: 1,
stagger: 0.05,
duration: 0.01
}, parseFloat(capEl.dataset.start));
当用户只说 "帮我这段口播加字幕和特效" 时,AI 自动执行以下完整流程:
用户提供 MP4 →
① ffprobe 检查素材参数
② 标准化为 1080p 30fps(如需)
③ hyperframes transcribe --model medium
④ 自动字幕分组(中文 14字/行,按标点断句)
⑤ 选择字幕动效(默认:底部弹出 + 半透明底板)
⑥ 选择特效层(默认:轻微粒子 + 底部信息条)
⑦ 编写 Composition HTML
⑧ hyperframes render
⑨ 音频处理(默认保留原片完整音频;仅在用户要求时混入 BGM)
⑩ 自检管线 Phase B-D
⑪ 交付 final.mp4
AI 默认选择(用户未指定时):
用户可随时覆盖任何默认选择。
在标准 Phase A-D 自检管线之外,口播后期处理需额外检查:
| # | 检查项 | Pass 标准 |
|---|---|---|
| P1 | 字幕与语音同步 | 字幕出现时间 ≤ 语音开始后 0.2s |
| P2 | 字幕无遮挡关键画面 | 字幕区域(bottom 100px)无人脸/关键信息 |
| P3 | 原视频音画同步 | 渲染后口型与音频匹配 |
| P4 | 特效层不干扰主内容 | 特效透明度 ≤ 0.3,不遮挡人物/字幕 |
| P5 | BGM ducking 生效 | 人声段 BGM 降到 -20dB 以下 |
| P6 | 视频时长完整 | output duration ≥ input duration - 0.1s |
For complex videos, split into separate HTML files:
<!-- In index.html -->
<div id="scene-1" data-composition-id="intro"
data-composition-src="compositions/intro.html"
data-start="0" data-duration="5" data-track-index="1"></div>
Sub-composition files use <template> wrapper (main index.html does NOT):
<template id="intro-template">
<div data-composition-id="intro" data-width="1920" data-height="1080">
<!-- content, style, script -->
</div>
</template>
Declare on <html> root, read with window.__hyperframes.getVariables():
<html data-composition-variables='[
{"id":"title","type":"string","label":"Title","default":"Hello"},
{"id":"accent","type":"color","label":"Accent Color","default":"#ff6b35"}
]'>
Override at render: npx hyperframes render --variables '{"title":"Q4 Report"}'
When the user requests modifications after initial delivery, follow this decision tree to minimize re-work:
| Change type | Scope | Actions required |
|---|---|---|
| Text/data change | Single scene content | Edit HTML → Lint → Re-render → Re-mux audio → Verify |
| Layout/position fix | CSS class or inline style | Edit CSS → Lint → Re-render → Re-mux audio → Verify |
| Color/style change | CSS variables or colors | Edit CSS → Lint → Re-render → Re-mux audio → Verify |
| Animation timing | GSAP parameters | Edit JS → Lint → Re-render → Re-mux audio → Verify |
| Add/remove scene | Structure change | Edit HTML+JS → Recalculate all timings → Lint → Re-render → Re-mux → Verify |
| BGM style change | Audio only | Only when user requested BGM change: regenerate/replace BGM → Re-mux only (skip re-render) → Verify audio |
| Duration change | Everything | Full rebuild required |
1. Identify affected scene(s) — read current HTML
2. Make targeted edit(s) — ONLY touch affected parts
3. npx hyperframes lint — must pass
4. npx hyperframes render --quality draft — fast preview check
5. If draft looks good → render --quality standard
6. Re-mux the approved audio source (original_audio.wav for existing videos, bgm.wav only for generated-BGM projects)
7. Verify final MP4 (Phase C+D checks)
8. Deliver
Key optimization: DO NOT regenerate or replace audio unless the user requests a music style change or duration change. For existing source videos, reuse the extracted original_audio.wav; for generated-BGM projects, reuse the existing approved bgm.wav across layout/content fixes.
--quality draft FirstALWAYS render draft quality first when:
Draft renders at ~3x speed. Only proceed to standard after visual confirmation.
When user reports multiple issues at once:
NEVER render between each individual fix — that wastes 10+ minutes per cycle.
以下规则是中文视频的经验参考,帮助 AI 在用户未给出详细排版指示时做出合理的中文布局决策。当用户的提示词有明确的排版/密度/风格要求时,以用户要求为准。
| Scene duration | 建议中文字数 (同时在屏) | 建议内容项数 |
|---|---|---|
| 3 seconds | 60-80 字 | 3-4 items |
| 4 seconds | 80-120 字 | 4-6 items |
| 5+ seconds | 120-160 字 | 6-8 items |
自检关注点: 如果某场景文字过密导致不可读(字号 ≤ 28px 且停留 ≤ 3s),AI 应自动拆分——这是可读性问题,不是风格限制。
/* Line breaking — keep Chinese words together */
.content-area, .content-col, .glass-card {
word-break: keep-all; /* prevent mid-word breaks */
overflow-wrap: break-word; /* break only at natural points */
line-break: strict; /* no punctuation at line start */
}
/* Chinese line height — wider than English */
p, span, .card-desc { line-height: 1.6; } /* body text */
h2, h3 { line-height: 1.3; } /* titles */
/* Mixed CJK + Latin spacing */
.mixed-text { text-spacing-trim: space-all; } /* if supported */
/* Fallback: manually add thin space between Chinese and numbers/English */
| Pattern | Example | Rule |
|---|---|---|
| Number + Chinese unit | 128亿美元 | No space between number and Chinese |
| English brand + Chinese | GitHub Copilot 工具 | Space between English and Chinese |
| Percentage | 156% or 156% | Use half-width % (more compact) |
| Punctuation | ,、。; | Use full-width Chinese punctuation in body text |
| Data labels | 市场规模: | Use full-width colon : in Chinese context |
| Card titles | 核心技术突破 | No trailing punctuation on card titles |
Chinese characters are wider than Latin characters. 当 AI 自行规划布局时可参考:
| Element | English width | Chinese adjustment |
|---|---|---|
| Card title | 40-44px | 38-42px (reduce 2px) |
| Body text | 34-38px | 32-36px (reduce 2px) |
| Card width | 360px | 380-400px (increase 20-40px) |
| Line chars | ~40 chars/line | ~18-22 中文字/行 |
数据展示场景:
主数据: "128亿" (数字 88-96px + 单位 42px)
标签: "市场规模" (38px, opacity 0.7)
卡片场景:
图标: 64-72px emoji/SVG
卡片标题: "核心技术突破" (40px, bold)
描述文字: "一句话说明功能或数据" (32-34px, opacity 0.8)
底部标注: "具体数据或来源" (28px, opacity 0.6)
列表场景:
序号: "01" (56px, accent color)
内容: "一行描述,不超过25字" (36px)
NOTE: 以上仅为参考模板。AI 完全可以使用不同的信息层级、卡片结构或数据展示方式。
┌───────────────────────┐
│ Top Safe: 120px │ ← Platform UI (status bar)
├───────────────────────┤
│ Title Zone: │
│ top 160px, h ≤ 200px │
├───────────────────────┤
│ │
│ Content Zone: │
│ top 400px ~ bot 1600px│ ← Available: 1200px
│ │
├───────────────────────┤
│ Bottom Safe: 320px │ ← Platform UI (controls, comments)
└───────────────────────┘
| Element | Safe range | Hard max |
|---|---|---|
| Cover mega title | 120-160px | 180px |
| Scene title | 80-100px | 120px |
| Subtitle | 48-64px | 72px |
| Body text | 36-48px | 56px |
| Card title | 40-52px | 56px |
| Data number | 96-128px | 140px |
| Icon/emoji | 72-108px | 128px |
Unicode Emoji (first choice for most cases)
Inline SVG (when custom icons needed)
CSS-drawn shapes (for simple geometric icons)
@import (same issue as Google Fonts)Categories:
💻 编程/开发 🚀 发布/增长 📊 数据/图表 🔧 工具/设置
🎯 目标/聚焦 ⚡ 性能/速度 🔒 安全/隐私 🌐 网络/全球
📱 移动端 🤖 AI/机器人 🎮 游戏 🏗️ 架构/构建
💡 创新/灵感 📈 增长/趋势 🛡️ 防护/安全 ⏱️ 时间/效率
Specific use cases:
Web开发: 🌐 移动应用: 📱 AI/ML: 🤖 游戏: 🎮
网络安全: 🔒 数据科学: 📊 代码: 💻 部署: 🚀
效率: ⚡ 质量: ✅ 风险: ⚠️ 趋势: 📈
/* Standard icon in card */
.card-icon { font-size: 64px; line-height: 1; }
/* Small inline icon */
.inline-icon { font-size: 48px; vertical-align: middle; }
/* Feature highlight icon */
.feature-icon { font-size: 72px; }
/* NEVER exceed these for icons: */
/* Horizontal video: 96px max */
/* Vertical video: 128px max */
<!-- Reusable SVG icon pattern for tech videos -->
<svg width="64" height="64" viewBox="0 0 64 64" fill="none">
<circle cx="32" cy="32" r="28" stroke="currentColor" stroke-width="2" opacity="0.3"/>
<path d="M20 32 L28 40 L44 24" stroke="currentColor" stroke-width="3" stroke-linecap="round"/>
</svg>
| Quality | Speed ratio | 10s video | 30s video | 42s video | 60s video |
|---|---|---|---|---|---|
draft | ~6fps | ~2 min | ~5 min | ~7 min | ~10 min |
standard | ~3fps | ~3 min | ~10 min | ~14 min | ~20 min |
high | ~1.5fps | ~7 min | ~20 min | ~28 min | ~40 min |
Times are approximate. Complex scenes (many particles, gradients, blur filters) render slower.
| Factor | Impact | Mitigation |
|---|---|---|
backdrop-filter: blur() | +30-50% | Limit to 3-4 elements max |
| > 50 particles | +20% | Cap at 40, reduce size |
| Multiple box-shadows | +15% | Use single subtle shadow |
| 60fps (vs 30fps) | +100% | Use 30fps unless requested |
| Large video backgrounds | +40% | Use gradient/CSS backgrounds instead |
Before starting a render, inform the user:
"开始渲染 [质量] 品质视频([时长]秒),预计需要 [X-Y] 分钟。
渲染期间我会持续检查进度,完成后立即进行音频合成和质量验证。"
用户要求"快速看一下效果" → --quality draft
用户要求"正式版/最终版" → --quality standard
用户明确说"最高画质" → --quality high --fps 60
修复布局问题验证 → --quality draft (先确认再正式渲染)
| Complexity level | Max DOM elements | Max particles | Max cards | Render impact |
|---|---|---|---|---|
| Light | < 200 | 20 | 3-4 | Normal speed |
| Medium | 200-500 | 40 | 6-8 | +20% time |
| Heavy | 500-1000 | 60 | 10-12 | +50% time |
| Danger zone | > 1000 | > 80 | > 15 | May crash Chrome |
/* Use will-change for animated elements (Chrome optimization) */
.scene-wrapper, .glass-card, .particle {
will-change: transform, opacity;
}
/* Reduce paint complexity */
.particle {
border-radius: 50%;
/* Use background-color, NOT box-shadow for particles */
background: currentColor;
}
/* GPU-accelerated properties only */
/* PREFER: transform, opacity */
/* AVOID: width, height, top, left, margin, padding, box-shadow (triggers layout) */
--workers to 1: npx hyperframes render --workers 1backdrop-filter: blur() with pre-blurred gradient backgroundsbox-shadow layersIf render exceeds 20 minutes for a ≤ 60s video:
ps aux | grep chrome)--workers 1 --quality draft| Aspect ratio | Resolution | Use case | Init config |
|---|---|---|---|
| 16:9 横版 | 1920×1080 | YouTube, 公众号, B站 | data-width="1920" data-height="1080" |
| 9:16 竖版 | 1080×1920 | 抖音, Reels, 视频号 | data-width="1080" data-height="1920" |
| 1:1 正方形 | 1080×1080 | Instagram Feed, 小红书 | data-width="1080" data-height="1080" |
| 4:5 竖版 | 1080×1350 | Instagram Feed (推荐) | data-width="1080" data-height="1350" |
| 4:3 标准 | 1440×1080 | 演示文稿风格 | data-width="1440" data-height="1080" |
| Resolution | Title top | Content top | Content bottom | Side margin |
|---|---|---|---|---|
| 1920×1080 | 50px | 240px | 980px | 120px |
| 1080×1920 | 160px | 400px | 1600px | 60px |
| 1080×1080 | 50px | 200px | 980px | 80px |
| 1080×1350 | 80px | 260px | 1200px | 80px |
| Element | 1920×1080 | 1080×1920 | 1080×1080 |
|---|---|---|---|
| Mega title | 96-120px | 120-160px | 80-100px |
| Scene title | 64-84px | 80-100px | 56-72px |
| Body text | 28-42px | 36-48px | 28-38px |
| Card title | 32-44px | 40-52px | 30-40px |
| Icon | 56-96px | 72-108px | 48-80px |
1920×1080 (横版):
3 horizontal cards: 480px each + 36px gap
2×3 grid: 520px × 300px cards
1080×1920 (竖版):
2 horizontal cards: 480px each + 24px gap
1-column stack: 920px wide cards
1080×1080 (正方形):
2 horizontal cards: 460px each + 24px gap
2×2 grid: 460px × 340px cards
{project_name}_{version}_{quality}.mp4
Examples:
ai_coding_video_v1_standard.mp4 ← first delivery
ai_coding_video_v2_standard.mp4 ← after layout fix
ai_coding_video_v3_standard.mp4 ← after content update
ai_coding_video_final_standard.mp4 ← user-approved final
After user approves a final version:
# Keep: final approved MP4, source HTML, bgm.wav
# Remove: intermediate renders
rm -f render_v1.mp4 render_v2.mp4 render_v3.mp4
rm -f final_ai_coding_v1.mp4 final_ai_coding_v2.mp4
# Keep: final_ai_coding_v3.mp4 (or rename to _final)
Rule: Never auto-delete intermediate files without user confirmation. Ask: "是否清理中间渲染文件?只保留最终版本。"
Before calling deliver_attachments:
Always keep in project directory:
index.html — source composition (for re-editing)original_audio.wav — extracted original audio for existing-video edits, when source audio must be preservedbgm.wav — approved BGM audio only when the project uses generated or external BGMscripts/verify_audio.py, scripts/gen_bgm.py remain available in the Skill package).mp4 — the delivered productBefore delivering any video, execute the full AI Self-Check Pipeline (Section 9). The checklist adapts to prompt mode:
npx hyperframes lint → 0 errors. (H-all)Math.random, no banned fonts, no inline top:% overrides. (H1, H2, H7)Math.floor. (H1, H8)data-duration on root = last scene end time.content_height vs available_height. If overflow → reduce/split.Note: Items 12-14 仅在 AI 使用了标准布局骨架时检查。如果 AI 根据用户提示词或创意判断使用了自定义布局,这些项被替换为 Item 8 (frame spot-check) 的通过即可。
| # | 限制 | 影响 | 应对方式 |
|---|---|---|---|
| 1 | HyperFrames 内置音频 ≥32s 截断 | 长视频音频不完整 | FFmpeg 后置合成完整音频 (H9) |
| 2 | 仅支持 Inter/JetBrains Mono/Roboto 字体 | 中文字体无法自定义 | 依赖 sans-serif 系统回退 |
| 3 | 不支持 @import url() 引入字体 | 自定义 Google Fonts 不可用 | 仅用 Compiler 自动解析的字体 |
| 4 | 渲染需 Headless Chrome + FFmpeg | 无法在纯容器环境直接运行 | 需安装完整依赖或使用 Docker |
| 5 | 非确定性 API 会导致帧不一致 | Math.random() 等使渲染结果不可复现 | 使用 mulberry32 seeded PRNG |
| 6 | 单次渲染 DOM 复杂度上限 ~500 节点 | 超出可能导致 Chrome crash | 拆分场景或简化 DOM |
| 7 | 竖版视频 (1080×1920) 为实验性支持 | 部分动画比例需手动调整 | 使用竖版安全区参考表 |
| 8 | 渲染速度受机器性能影响 | 42s 视频约需 3-8 分钟渲染 | 开发阶段用 draft 模式预览 |
npx hyperframes doctor # Check environment
npx hyperframes browser # Manage bundled Chrome
npx hyperframes info # Version details
Common issues:
brew install ffmpegnpx hyperframes browser to downloadrepeat: -1 in timelineswindow.__timelines is registered synchronously